STAT 245 Notes
1
Description
2
Linear Regression
2.1
Data
2.2
Simple linear regression, Residuals & Least squares
2.2.1
Using
lm()
to fit a linear regression in R
2.2.2
Equation of the fitted regression line
2.3
Multiple regression
2.3.1
Is it really better?
2.3.2
Regression residuals = “errors”
2.3.3
Computing Predictions
2.4
Predictors with two categories
2.4.1
Predictors with more categories
2.5
Returning to the R Model Summary
2.6
Predictions from the model
2.6.1
By Hand
2.6.2
Prediction Plots in R
2.7
Why are we doing this again?
2.8
Shortcut Method - With Uncertainty
2.8.1
Anatomy of a Confidence Interval
2.9
DIY Method
2.9.1
Creating a hypothetical dataset
2.9.2
Making the plot
2.9.3
Categorical predictors
3
Model Selection Using Information Criteria
3.1
Data and Model
3.2
Calculations
3.3
Decisions with ICs
3.4
All-possible-subsets Selection
3.5
Which IC should I use?
3.6
Quantities derived from AIC
3.7
Important Caution
4
Likelihood
4.1
Data
4.2
Review - the Normal probability density function (PDF)
4.3
A simple model
4.4
Using the Model to Make Predictions
4.5
Likelihood to the Rescue!
4.6
How does this relate to linear regression?
4.6.1
Model Equation:
4.7
Likelihood of a dataset, given a model
5
PDFs and PMFs
5.1
Beyond Normal
5.2
Types of probability distributions
5.2.1
Continuous distributions
5.2.2
Discrete Distributions
5.3
Relevant Features of Distributions
5.4
Examples of Continuous Distributions
5.4.1
Normal
5.4.2
Gamma
5.4.3
Beta
5.5
Examples of Discrete Distributions
5.5.1
Binomial
5.5.2
Poisson
5.5.3
Negative Binomial
5.5.4
A Mixture Distribution: Tweedie
6
Regression for Count Data
6.1
Data Source
6.2
A bad idea: multiple linear regression model
6.3
Problems with the linear model
6.4
Poisson Regression
6.4.1
Fitting the Model
6.4.2
Conditions
6.4.3
Model Assessment
6.4.4
Checking for overdispersion using overdispersion factor
6.5
Accounting for overdispersion: Negative Binomial Models
6.6
Accounting for overdispersion: quasi-Poisson Model
6.7
Model selection with dredge() and (Q)AIC, BIC
6.7.1
Review: all subsets selection with dredge()
6.7.2
Review: IC “weights”
6.7.3
Extending dredge() to quasi-Likelihood
6.8
Offsets
6.9
Prediction Plots
7
Model Averaging
7.1
Data: School Survey on Crime and Safety
7.2
Modelling number of violent incidents per school
7.3
Model Averaging
7.3.1
Getting the Averaged Model
7.3.2
Getting Predictions from the Averaged Model
8
Interactions
8.1
Example: Quantitative-Categorical Interaction
8.2
Categorical-Categorical Interaction Example
8.3
Quant-Quant interactions?
8.4
R code
8.5
Cautionary note
9
Binary Regression
9.1
Data Source
9.2
Logistic Regression
9.3
Checking the data setup
9.4
Fitting a saturated model
9.5
Link Functions
9.6
Conditions
9.7
Model Assessment Plots
9.8
Odds Ratios
9.9
Model Selection
9.10
Prediction Plots
10
Binary regression: Data with more than one trial per row
10.1
Data
10.2
Checking the data setup
10.3
Fitting a saturated model
10.4
Checking linearity
10.5
Model Assessment
10.6
Model Selection
11
Collinearity and Multicollinearity
11.1
Graphical Checks
11.1.1
Preferred option: Correlation Scatter Plot
11.1.2
Another option: Heat map of correlation coefficients
11.1.3
How to use this information
11.2
Variance Inflation Factors
11.2.1
Quantitative predictors (VIFs)
11.2.2
(Some) Categorical Predictors (GVIFs)
11.2.3
Rules of Thumb
11.3
Was it worth it?
12
Zero-Inflation
12.1
Reference material
12.2
Data for Example
12.3
Visualization?
12.4
Collinearity/Multicollinearity?
12.5
Fitting models
12.5.1
Zero-inflated Poisson
12.5.2
Zero-inflated NB: one way
12.5.3
Zero-inflated negative binomial (other way)
12.5.4
Tweedie Model
12.6
Model Assessment
12.7
Model Selection?
12.7.1
Zero inflation covariates
12.7.2
Interaction terms
12.7.3
Dredge
12.8
Prediction Plots
12.9
Acknowledgements
13
Non-constant variance (and other unresolved problems)
13.0.1
Already in Our Tool Box: Make sure the model is “right”
13.0.2
Already in Our Tool Box: Models that estimate dispersion parameters
13.0.3
Gamma GLMs
13.0.4
Beta GLMs
13.0.5
Transformations
13.0.6
Modelling non-constant variance
14
Random Effects
14.1
Dataset
14.2
Data Exploration
14.3
A Base Linear Model
14.3.1
Model assessment
14.4
A Random Effects model
14.4.1
The Formula
14.4.2
The Results
14.4.3
Model Assessment
14.4.4
Refinement
14.5
Model Selection for Mixed Models
14.5.1
REML or ML?
14.5.2
Best model so far:
14.6
Random Slopes?
14.7
Prediction Plots
14.7.1
Parametric bootstrap to the rescue!
14.8
Random effects with glmmTMB and standardized residuals
14.9
Model for whale dive duration
14.10
glmmTMB
14.11
Model assessment with scaled residuals
14.11.1
glmmTMB version
14.11.2
You now have the power!
15
GEEs
15.1
Data Source
15.2
Data Exploration
15.3
Linear Regression
15.4
Model Assessment
15.5
Linear Regression
15.6
Generalized Estimating Equations (GEEs)
15.6.1
Fitting GEEs with different correlation structures
15.6.2
Comparing different correlation structures}
15.7
GEE model assessment
15.8
Model Selection - Which variables?
15.9
Prediction Plots
16
Correlation Structures
16.0.1
Variance/Covariance or Correlation?
16.0.2
Example Case
16.0.3
Independence
16.0.4
ACF Example
16.0.5
Exchangeable = Block Diagonal
16.0.6
AR1 (first-order autoregressive process)
16.0.7
Unstructured
17
Other Model Selection Approaches
17.0.1
Rationale
17.0.2
Hypotheses
17.1
Backward selection
17.1.1
Algorithm
17.1.2
Example
17.1.3
Can’t this be automated?
17.1.4
Stepwise IC-based selection
17.2
Summary tables
17.2.1
Mean (or sd, median, IQR, etc.) by groups
17.2.2
Proportions in categories by groups
17.3
Figures
18
GAMs: Generalized Additive Models
18.1
Non-linear, non-monotonic relationships
18.2
Smooth terms
18.2.1
Basis functions
18.3
Fitting GAMs
18.3.1
Choosing model formulation
18.3.2
Model formula
18.4
Model Assessment
18.4.1
Concurvity
18.5
Model Selection
18.5.1
Shrinkage and Penalties
18.5.2
P-value selection
18.5.3
Information criteria
References
Published with bookdown
STAT 245 Course Notes
References