You are on page 1of 5

For more Tip Sheets, and datasets used within them, visit http://www.reading.ac.

uk/statistics/tipsheets/
This project was funded by the generosity of alumni donations to the Annual Fund.

University of Reading 2011 Page 1


MINITAB Tip sheet 15
Binary logistic regression

Use this tip sheet if you have a binary response variable and you wish to see how
explanatory variables are related to it.

Screen shots have been taken from Minitab version 15.


To use binary logistic regression, the response variable must only have two outcomes, i.e.
yes/no, 0/1. The aim of this method is to fit a regression equation of the form
2 2 1 1
1
ln x x
p
p
| + | + o =
|
|
.
|

\
|


to the data (assuming two explanatory variables), where is the intercept,
1
and
2
are the
parameters for the explanatory variables x
i
, p is the proportion for the event of interest and
ln denotes the natural log function. The term on the left hand side of the equation is known
as the logit link function of p. The explanatory variables can be quantitative (discrete or
continuous) or qualitative (factors).

The dataset used for the example is heartattack.xlsx which can be accessed from the Tip
Sheets webpage. The dataset contains information on the number of patients who had
recovered from (survived) a heart attack together with data on their cholesterol level and
age.

To perform a logistic regression select Stat, then Regression, followed by Binary logistic
regression.

You will then be presented with the following window:
Department of
Mathematics and Statistics

For more Tip Sheets, and datasets used within them, visit http://www.reading.ac.uk/statistics/tipsheets/
This project was funded by the generosity of alumni donations to the Annual Fund.

University of Reading 2011 Page 2




Minitab will give a large amount of output, and the main parts to interpret are explained
below. If any of this information is unclear please discuss with a statistical advisor.

The output for this example is as follows:


























Binary Logistic Regression: Recovered versus Cholesterol, Age

Link Function: Logit

Response Information

Variable Value Count
Recovered y 11 (Event)
n 4
Total 15

Logistic Regression Table
Odds 95% CI
Predictor Coef SE Coef Z P Ratio Lower Upper
Constant 9.40648 6.03119 1.56 0.119
Cholesterol -0.0634357 0.661327 -0.10 0.924 0.94 0.26 3.43
Age -0.184068 0.0965055 -1.91 0.056 0.83 0.69 1.01


Log-Likelihood = -3.633
Test that all slopes are zero: G = 10.131, DF = 2, P-Value = 0.006

Goodness-of-Fit Tests

Method Chi-Square DF P
Pearson 6.76499 12 0.873
Deviance 7.26670 12 0.839
Hosmer-Lemeshow 2.86246 8 0.943



Add the response variable to
the Response box.
In the Model box add the
explanatory variables. If any
of these are factors, you
must also add them to the
Factors box.
The default settings for these
buttons are fine for this
example. But to change the
event class you would select
Options and to produce
more output, select Results.
We desire the goodness-of-fit tests to
be non-significant, i.e. no evidence
that the model is not a good fit.
The coefficients for the model, p-values and odds
ratios are given here.
The response is stated, with the total
observations in the count column. Note: the
word Event indicates the category used for p.
For more Tip Sheets, and datasets used within them, visit http://www.reading.ac.uk/statistics/tipsheets/
This project was funded by the generosity of alumni donations to the Annual Fund.

University of Reading 2011 Page 3






















Interpretation: When interpreting the results first look at the p-values for each variable in
the model. If any of these are not significant then we would usually remove them, unless
there was a good reason why they should still be in the model, and refit the model. Once
the final model has been obtained, we then move on to interpret the odds ratio.

For this example we have decided that we want both variables to be in the model, so that
we are looking at the significance of each term given that the effect of the other has already
been accounted for. The p-value for cholesterol is 0.924, which is not significant, so there is
no evidence to suggest cholesterol has an effect on recovery after age is accounted for. For
age the p-value is 0.056 which is significant at the 10% level of significance, and so there is
some evidence of an effect. As age is significant we then look at the odds ratio, to see the
direction of the effect (i.e. whether age increases or decreases the likelihood of recovery).
The odds ratio is 0.83, and when interpreting this value we compare it to the value of 1
(which indicates no effect). As the value is less than 1, the odds of recovery for individuals
who are older are less than that for individuals who are younger: older age is therefore
detrimental to recovery. If the odds ratio had been greater than 1, it would have suggested
that older age was beneficial.

The goodness-of-fit test p-values are not significant and so there is no evidence here that the
model is not a good fit. If these values are significant then advice should be sought from a
statistician about the best course of action.

In this example the value of the concordant percentage is 95.5% which is extremely high,
indicating that the model is likely to be reliable for prediction purposes (i.e. categorising
individuals into likely recovery/non-recovery groups).


Table of Observed and Expected Frequencies:
(See Hosmer-Lemeshow Test for the Pearson Chi-Square Statistic)

Group
Value 1 2 3 4 5 6 7 8 9 10 Total
y
Obs 0 1 0 1 1 2 1 2 1 2 11
Exp 0.0 0.3 0.4 1.5 0.9 1.9 1.0 2.0 1.0 2.0
n
Obs 1 1 1 1 0 0 0 0 0 0 4
Exp 1.0 1.7 0.6 0.5 0.1 0.1 0.0 0.0 0.0 0.0
Total 1 2 1 2 1 2 1 2 1 2 15

Measures of Association:
(Between the Response Variable and Predicted Probabilities)

Pairs Number Percent Summary Measures
Concordant 42 95.5 Somers' D 0.91
Discordant 2 4.5 Goodman-Kruskal Gamma 0.91
Ties 0 0.0 Kendall's Tau-a 0.38
Total 44 100.0











Here we could choose to look at the concordant percentage. The higher this
value the better the model. This value is a representation of how reliable
future predictions will be using the model.
For more Tip Sheets, and datasets used within them, visit http://www.reading.ac.uk/statistics/tipsheets/
This project was funded by the generosity of alumni donations to the Annual Fund.

University of Reading 2011 Page 4
Data in success/trial format

Sometimes data are in a different format, but they can still be analysed with a logistic
regression approach. Suppose that you were looking at the affect of age on the time of
recovery from pain following shingles. Instead of having a row per individual in your data,
with a column for age and another indicating if that patient recovered or not, you may
instead have one row per distinct value of age, together with summary data about the
number of patients of that age, of which how many recovered by the end of the study. That
is, your data may be in the following form:








There are 19 people in this study, but that data are only entered as five rows (five people
were aged 30, and two of those had recovered by the end of the study).

To carry out a logistic regression on these data, select Stat, then Regression, followed by
Binary logistic regression. You will then be presented with the following window as before:













Note: the output will look the same as in the previous analysis, and can be interpreted in
the same way. Other options are also the same.


Unlike before, select the Response in
event/trial format section. Enter the
column containing the number who
recovered in the Number of events box,
and the column containing the number
of patients for an age group in the
Number of trials box. In the Model box
enter the explanatory variables, with
any factors also added to the Factors
box, as before.
For more Tip Sheets, and datasets used within them, visit http://www.reading.ac.uk/statistics/tipsheets/
This project was funded by the generosity of alumni donations to the Annual Fund.

University of Reading 2011 Page 5
Extra information on interpretation

- P-value: When interpreting the p-value for a test of the model coefficients, if the
value is less than 0.05 then the test is significant at the 5% level, and we would
usually say there is evidence to reject the null hypothesis (of the true parameter
value being 0). If the p-value is less than 0.1 but greater than 0.05 then there is weak
evidence in favour of the alternative hypothesis. Finally if the p-value is greater than
0.1 then we would usually say there is no evidence to reject the null hypothesis.
Note: NEVER ACCEPT the null hypothesis and conclude it to be true as this will be
incorrect; we always reject or do not reject the null.
- We should report p<0.001 if Minitab gives 0.000 in the output for a p-value.

And finally ...

If you are unsure about the correct statistical methods to apply in the analysis of your data,
book an appointment for our Statistical Advisory Service (www.reading.ac.uk/Stats-Advisory).
We will be pleased to advise you on aspects of design, analysis and interpretation.


























Portions of the input and output contained in this publication are printed with permission of Minitab Inc. All
material remains the exclusive property and copyright of Minitab Inc. All rights reserved.

You might also like