Chapter 4 Likelihood

In the last section, we said that “likelihood” is a measure of goodness-of-fit of a model to a dataset. But what is it exactly and just how do we compute it?

4.1 Data

Today’s dataset was collected in Senegal in 2015-2016 in a survey carried out by UNICEF, of 5440 households in the urban area of Dakar, Senegal. Among these households, information was collected about 4453 children under 5 years old, including their

4.2 Review - the Normal probability density function (PDF)

\[ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \]







4.3 A simple model

The distribution of weights looks quite unimodal and symmetric, so we will model it with a normal distribution with mean 11.8 and standard deviation 3.53 (N( \(\mu=\) 11.8, \(\sigma=\) 3.53), black line).

4.4 Using the Model to Make Predictions

If you had to predict the weight of one child from this population, what weight would you guess?

Is it more likely for a child in Dakar to weigh 10kg, or 20kg? How much more likely?

What is the probability of a child in Dakar weighing 11.5 kg?

4.5 Likelihood to the Rescue!

Which is more likely: three children who weigh 11, 8.2, and 13kg, or three who weigh 10, 12.5 and 15 kg?

How did you:

  • Find the likelihood of each observation?



  • Combine the likelihoods of a set of three observations?



What did you have to assume about the set of observations?

4.6 How does this relate to linear regression?

What if we think of this situation as a linear regression problem (with no predictors)?

## 
## Call:
## lm(formula = AN3 ~ 1, data = wt)
## 
## Residuals:
## <Labelled double>: Poids de l'enfant (kilogrammes)
##    Min     1Q Median     3Q    Max 
## -9.896 -2.396  0.104  2.404 18.904 
## 
## Labels:
##  value            label
##   99.9 poids non mesuré
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.7964     0.0543     217   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.53 on 4216 degrees of freedom
##   (235 observations deleted due to missingness)

4.6.1 Model Equation:





4.7 Likelihood of a dataset, given a model

Finally, now, we can understand what we were computing when we did

## 'log Lik.' -11301 (df=2)

For our chosen regression model, we know that the residuals should have a normal distribution with mean 0 and standard deviation \(\sigma\) (estimated Residual Standard Error from R summary() output).

For each data point in the dataset, for a given regression model, we can compute a model prediction.

We can subtract the prediction from the observed response-variable values to get the residuals.

We can compute the likelihood (\(L\)) of this set of residuals by finding the likelihood of each individual residual \(e_i\) in a \(N(0, \sigma)\) distribution.

To get the likelihood of the full dataset given the model, we use the fact that the residuals are independent (they better be, because that was one of the conditions of of linear regression model) – we can multiply the likelihoods of all the individual residuals together to get the joint likelihood of the full set.

That is the “likelihood” that is used in the AIC and BIC calculations we considered earlier.