Linear regression, logistic regression, Poisson regression, Gamma regression, etc. You might have heard about those folks before. All of them are actually variants of a large family of models called the **Generalized Linear Models,** or **GLM** for short. There are a lot of nice books out there for learning GLMs (and I strongly recommend you to learn about them), but for the sake of the post, you only need to learn how we specify a GLM.

To specify a GLM, you need to specify two elements in the model:

- A
**probability distribution** that best describes the outcome variable.
- A
**link** function that determines the scale of the regression coefficients (which represent the measures of associations e.g. risk ratios, risk difference, mean difference, etc.)

Let’s see an applied example to see how this works. We will use the `mtcars`

dataset (loaded by default in R). We will model the variable `vs`

(which is a binary variable 0/1) against the variable `hp`

, which is a continuous variable. Don’t worry about the interpretation for now. I am just trying to demonstrate the concept of GLMs.

```
# install.packages(c("tidyverse","broom")) # Run run this if you don't have the package installed
# Load packages
library(tidyverse)
library(broom)
# Let's build and run GLM
my_model <- glm(vs ~ hp,
family = binomial(link = "logit"),
data = mtcars)
my_model |>
tidy()
```

(Intercept) |
8.3780240 |
3.2159316 |
2.605162 |
0.0091831 |

hp |
-0.0685607 |
0.0273995 |
-2.502265 |
0.0123401 |

The code above describes a GLM in which the distribution specified is the binomial distribution, which makes sense because the outcome variable is a binary (0/1) variable. We used the `link ="logit"`

to specify that we need the coefficients to be expressed in the **log odds ratio** scale. I built the model and stored it in the `my_model`

object.

We can interpret the coefficient -0.0685607 as the following “With each additional unit of the `hp`

variable, the **log odds** of outcome `vs`

will on average decrease by 0.0685607. And if you exponentiated this -0.0685607, you would get 0.9337368, and this would be your **odds ratio**!! Now you can interpret this as the following: With each additional unit of the `hp`

variable, the odds of the outcome `vs`

will, on average, decrease by 0.07, which is the excess odds given by .

If you haven’t noticed by now, this is essentially a logistic regression model. *Logistic regression is a GLM where the distribution is ***binomial**, and the link function is the **logit function.** Depending on your needs, you can vary the choices for both the distribution and link functions. However, things are not that straightforward in some situations, and I will explain later in the post why this might be the case.