SPSS Handout

Handout: Statistical Analysis using SPSS-Juliana Bahiense
Statistical Analysis Using SPSS command Practical Guide

Bahiense-Juliana de Sousa GuimarÃ£es. Salvador / BA
julianabahiense@gmail.com
Summary
1. Introduction ................................................. ..............
.................................... .................................. 3 2. Ste
p ................................................ .............................
..................... .......................... 3 3. The Windows ..............
.................................. .............................................
..... .................................. 4 4. Menus ............................
.................... .................................................. ........
............................ 6 4.1 Data Editor .......... ......................
............................ ..................................................
................ 6 4.2 Output ............................... ..................
................................ ...............................................
... ... May 9. Data Analysis ............................................... ...
............................................... ..................... 10 6. Bibl
iography ................................................ ......................
............................ ............ 19
1. Introduction
The Statistical Package for Social Sciences for Windows (SPSS) is a software for
statistical analysis of data, in a friendly environment, using menus and dialog
windows, which allows you to perform complex calculations and display results i
n a simple and self-explanatory. According to the Wikipedia site, "SPSS is a sof
tware application (computer program) the type of science, acronym for Statistica
l Package for Social Sciences - Statistical Package for Social Sciences. This pa
ckage to support decision making that includes: analytical application, Data Min
ing, Text Mining and statistics that turn data into valuable information that pr
ovide lower costs and increase profitability. One of the important uses of this
software is to perform market research. " The first version dates from 1968 and
the latest is the SPSS for Windows 16 (2007). To illustrate we will use the data
bases 1991 U.S. General Social Survey.sav anorectic.sav which is in the SPSS dir
ectory. and
To improve the use of the routines presented in this book it is necessary a prev
ious knowledge of statistical techniques of data mining.
2. First Step
Once you start the program the following screen:
There you can open an existing file (or database syntax or output), go to the tu
torial, create a new database.
3. The windows
In SPSS there are seven types of windows, they are: SPSS - Data Editor: allows e
ntry, modification and visualization of data. Output - SPSS Viewer: it is the re
sults window, tables and graphs. Syntax - SPSS Syntax Editor window: The window
where we keep the commands of SPSS for reuse at another time. SPSS Pivot Table O
bject: edit and modify tables. SPSS Chart Object: edit and modify graphics. Scri
pt Editor: create and modify scripts to automate tasks. Text Output Editor: chan
ge text not visible in the Pivot Table Editor.
However, he works primarily with the first three, which will be displayed in thi
s book. The initial appearance of the editor is presented in the following figur
es. In Figure 1 we have the Data View (Data Editor), in which columns are variab
les and rows cases (or individuals). The cells can contain numeric or alphanumer
ic values, but can not contain formulas.
Figure 1 - Display of data - bank anorectic.sav
In Figure 2 we have the Variable View (Data Editor), where we define the charact
eristics of the variables: name: variable name, maximum 64 characters, uppercase
and lowercase letters are equal. Type: type of variable (numeric, date, currenc
y, alphanumeric (string)) Width: length of the variable, ie the number of digits
you have. Decimals: number of decimal places that the variable has. Label: desc
riptive variable
Values: value labels of the variables (eg, 1 = female and 2 = male). Missing: to
indicate the coding of missing values, those that will not be considered for th
e purpose of statistical calculation. Columns: indicates the number of character
s that form the spine, ie the column width. Align: alignment of the data.ÂMeasu
re: selects the measuring scale of the variable (interval / ratio, ordinal or no
minal).
Figure 2 - Display of variables - bank anorectic.sav
In Figure 3 we have View (Output), which shows all the outputs required, such as
graphs, tables, and statistics. In Figure 4 we display the command syntax "Freq
uencies" Descriptive Statistics of the topic.
Figure 3 - Display Output - Output - bank anorectic.sav
Figure 4 - Screen syntax - Syntax - bank anorectic.sav
4. Menus
4.1 Date Editor
File - has the functions to create, open, read, print, save, show recently used
files, for the process, exit the program.
Edit - editing commands manage files, modify, copy, paste, cut, delete, find and
manipulate the output format (default).
View - format of screens: toolbars, fonts, status, and grid line and labels of v
ariables.
Date - insert variable or data, define data format, ordering file according to v
alues of a variable incorporates variables (in a new file - transpose), group fi
les (merge files), create new file with added values of the original variables,
divided a file according to a qualitative variable, selects cases that meet a ce
rtain condition, considering the values of the variable.
Transform - to change selected variable, calculate new variables from existing g
enerates random sample creates a new variable through existing recoding variable
s, transformed into categorical qualitative variable, assign jobs to the values
of a variable (according to another) creates variable Lag time series, replaces
missing values, the wheel transformaÃ§Ãµess pending.
Analyze - Descriptive Analysis and statistical functions, tables of frequencies,
ANOVA, Correlation, Regression, Factor Analysis, Reliability Analysis, Analysis
of multiple responses, non-parametric tests, Survival Analysis, etc..
Graphs - Create bar charts, sectoral, Box plot, line, histogram, etc..
Utilities - to obtain information about variables, change menus, scripts ...
Window - switching between different windows that are open SPSS.
Help - Help topics, tutorials, Home of SPSS.
4.2 Output
The menu bar is similar to the output of the Data Editor window, plus items Inse
rt and Format
5. Data Analysis
In SPSS we can create a new bank in the program itself or imported from another
software such with Excel, Access, dBase. After loading the database SPSS is read
y to be exploited. Start with simpler procedures for descriptive statistics. For
this analysis we will use the database in 1991 U.S. General Social Survey.sav
Table of Frequency Distribution To generate the frequency table follow the follo
wing commands in the menu bar on the windows or Data Editor Output: Analyze> Des
criptive Statistics>> Frequecies
Or, we can use the commands from Syntax window, as follows: FREQUENCIES VARIABLE
S = fri / ORDER = ANALYSIS.
For this example select the variable "fri" (sex of respondents), obtaining the f
ollowing output:
Respondent's Fri
Frequency Valid Male Female Total 636 881 1517
Percent 41.9 58.1 100.0
Valid Percent 41.9 58.1 100.0
Cumulative Percent 41.9 100.0
We can format the table data, such as number of decimal places, include%, font,
etc.. For this, it is also necessary in the Output window, giving double-click t
he left mouse on the table, for it opens up the "island" of editing, select the
data you want to format and give a click with the right to opens the list of men
u options. You can also request a frequency table of several variables at once,
simply select them in the dialog, or add them into commands Syntax: FREQUENCIES
VARIABLES = fri sibs / ORDER = ANALYSIS.
Later this item, we may request, through the Statistics button and some summary
statistics charts and graphs to represent the variables.
When we need to describe quantitative variables using general statistics we can
use the command: Analyze> Descriptive Statistics> Descriptive
Or even the commands: Analyze> Descriptive Statistics> Explore
Analyze this menu item can also obtain statistical parameters, boxplots and bran
ch-and-leaf and tests of normality Kolmogorov-Smirnov and Shapiro-Wilk (where th
e null hypothesis, H0, tells us that the studied variable follows normal distrib
ution, versus the alternative hypothesis, Ha, the variable does not
follows a Normal distribution, whose decision rule is if p-value <Î± then we rej
ect H0) and visual analysis using the graphic Detrended QQ and QQ (normality whe
n the points are distributed randomly around the line). To do the analysis of va
riable X according to the factors of variable X on Y should insert "Dependent Li
st" and Y "Factor List".
To analyze quantitative variables based on a qualitative, for example, want to k
now if sex (fri) may explain variations in study time (educ). We can do this che
ck using: I. II. III. IV. V. Analyze> Explore Analyze> Reports>> Report Summary
Row in Analyze> Compare Means> Means Analyze> Compare Means>> independet Sample
T Test Graphs> Boxplot
To apply the t-Student test must verify that the variable tested meet the assump
tions of normality and homoscedasticity, the latter can be checked by Levene tes
t whose null hypothesis says there is no difference between the variances. The t
-student test has as null hypothesis that there is no difference between the ave
rage of the variable by group (factor). For both tests we have as decision rule
if p-value <Î± then we reject H0. Variable crossover can be done through the com
mand: Analyze> Descriptive Statistics>> Crosstable Then we select the variables
that will form the rows and columns. We can add the percentages by clicking the
"Cell Display".
We can also use one of the commands of tables, for example: Analyze>> General Ta
bles>> General Tables
Correlation analysis can be done to address how the variables relate. We can obt
ain the Pearson correlation coefficients and Spearman correlation coefficient (v
ariables whose distribution is not Normal). Analyze>> correlate>> Bivariate
Correlations
Number of Children Spearman's rho Correlation Coefficient Number of Children Sig
. (2-tailed) N Highest Year of School Completed Correlation Coefficient Sig. (2-
tailed) N Highest Year School Completed, Father Correlation Coefficient Sig. (2-
tailed) N ** Correlation is Significant at the 0:01 level (two-tailed). 1.000. 1
509 -, 262 (**) 000 1507 - 297 (**) 000 1064
Highest Year of School Completed -, 262 (**) 000 1507 1000. 1510, 450 (**) 000 1
065
Highest Year School Completed, Father -, 297 (**) 000 1064, 450 (**) 000 1065 10
00. 1069
The null hypothesis tested is zero correlation (two-tailed test).
Regression analysis can be done to model a variable in another function (s). Ana
lyze> Regression>> (select the model type)
The following is the output from linear regression in which the dependent variab
le is "educ" and the independent variables are: "fri", "paeduc" and "maeduc.
Variables Entered / Removed (b)
Model 1
Variables Entered
Variables Removed
Method
Highest Year School Completed, Mother, Respondent's Fri, Highest Year School Com
pleted, Father (a)
. Enter
Requested All the variables entered. b Dependent Variable: Highest Year of Schoo
l Completed
Model Summary (b) R 1 R Square Model, 486 (a), 236, 234 Adjusted R Square
Std Error of the Estimate 2.448
Coefficient of determination: R2 = 23.6%. This model explains 23.6% of the varia
tion of "educ".
a Predictors: (Constant), Highest Year School Completed, Mother, Respondent's Fr
i, Highest Year School Completed, Father b Dependent Variable: Highest Year of S
chool Completed
ANOVA (b) Model Sum of Squares 1 Regression Residual Total 1796.560 5806.745 760
3.305 598.853 5.993 3969972 99.934 df Mean Square F Sig. , 000 (a)
P-value = 0.000 we reject H0 and educ can be modeled by a straight line with the
predictors selected.
a Predictors: (Constant), Highest Year School Completed, Mother, Respondent's Fr
i, Highest Year School Completed, Father b Dependent Variable: Highest Year of S
chool Completed
Coefficients (a) Coefficients unstandardized B 1 (Constant) Fri Respondent's Hig
hest Year School Completed, Father Highest Year School Completed, Mother 9.902 -
, 380, 196, 189 Std Error, 384, 160, 026, 031 -, 067, 288, Standardized Beta Coe
fficients 231 7.574 6.085 25.782 -2.381, 000, 017, 000, 000 Sig. t
the Dependent Variable: Highest Year of School Completed
The equation of the model is:
educ = 9.902 to 0.380 fri + 0.196 + 0.189 paeduc maeduc
All predictors are statistically significant.
Statistics residuals (a) Minimum Predicted Predicted Value Standard Value Std Er
ror of Predicted Value Adjusted Predicted Value 9.11 Residual Residual Std Stud.
Deleted Residual Residual Stud. Deleted Residual Mahal. Cook's Distance Distanc
e Centered Leverage Value, 001 to Dependent Variable: Highest Year of School Com
pleted, 023, 003, 003 973 -9.603 -3.923 -3.930 -9.636 -3.959, 744, 000 17.20 8.2
77 3.381 3.399 8.365 3.418 22.354, 045 13 , 54, 000, 000, 000, 000, 000 2.997 00
1 1.359 2.444, 1.001 2.455 1.002 2.499 998, 003 973 973 973 973 973 973 973 973
9.14 -3.239 104 17.22 2.707 Maximum 379 Mean 13, 54, 000, 151 Std Deviation 1.36
0 1.000, 041 N 973 973 973
Normal PP Plot of Regression Standardized Residual
Dependent Variable: Highest Year of School Completed
1.0
0.8
Expected Cum Prob
Analysis Visula waste to assess the quality of adjustment. Indicates data normal
ity "educ".
0.6
0.4
0.2
0.0 0.0 0.2 0.4 0.6 0.8 1.0
Observed Cum Prob
The Factor analysis has as main objective to describe the variability of a set o
f variables in terms of a smaller number of variables that are related to the or
iginal group by the linear model, without loss of information. SPSS uses the fol
lowing commands: Analyze> Data Reduction> Factor
In this dialog, we can specify and descriptive statistics and correlation coeffi
cients.
We selected the method of extracting factors.
Initial solution presents the commonalities, the eigenvalues and the percentage
of variance explained.
Correlation matrix: variables in different scales. Covariance matrix: multiple g
roups with different variances for each variable.
Tests for the validity of the application of factor analysis. . Interpretation o
f the KMO test: <.50 from .50 to .60 from .60 to .70 .70 to .80 .80 to .90 .90 t
o 1 initials. In the same dialog box can also set the Rotation (Rotation), which
is applied to transform coefficients of the main components in a simplified str
ucture with Method: Varimax: Some significant weights and the other close to zer
o Quartimax: heavy weights for a few components and near zero for the other. Equ
amax: combination of Varimax and Quartimax. Direct oblimin and Promax: methods n
ot orthogonal, there is the assumption of independence of components. The method
of calculating the scores are defined in Scores. And we can choose the Options
will be treated as a missing value, for example. Unacceptable Poor Fair Average
Good Very Good
The null hypothesis test of sphericity Bartlett says there is no correlation bet
ween variables
In SPSS we have some tests of hypotheses, for example, have with parametric test
s available to test theo ANOVA and nonparametric tests like the Sign test, McNem
ar, Wilcoxon, Mann-Whitney and Kruskal-wallis, Randomness, Binomial and Chi-squa
re .
The t test can be done via the command: Analyze> Compare Means> Independent Samp
les T test
The groups of the variable is defined in "Define Groups".
These values correspond to codes used in the variable, in this case, "fri", 1 =
male and 2 = female
The output is shown:
Group Statistics Respondent's Fri Male Female N 633 877 Mean 13.23 12.63 Std Dev
iation 3.143 2.839 Std Error Mean, 125, 096
Highest Year of School Completed
% Chance of observing a mean difference of this value, if H0 is true.
Independent Samples Test Levene's Test for Equality of Variance
t-test for Equality of Means Difference Std Error, 155, 157 95% Confidence Inter
val of the Difference Lower, 298, 293 Upper, 906, 911
Highest Year of School Completed
Equal variances assumed Equal variances not assumed
F 11.226
Sig. , 001
t 3.887 3.824
df 1508 1276.454
Sig. (2-tailed), 000, 000
Mean Difference, 602, 602
Test of equal variances. H0 equal variances.
Average years for samples (mas. and fem.) Differed 0,602 years.
The ANOVA can be done through the command: Analyze> Compare Means> One-Way ANOVA
Get summaries of data. Get multiple comparison test of Bonferroni.
For Non-parametric tests proceeded as follows: Analyze>> Nonparametric Tests
We, in this order, the chi-square, Binomial, randomness, Kolmogorov-Smirnov test
for two independent samples, test for two related samples, Kruskal-Wallis and M
edian (k Independent Samples)
To do Cluster AnalysisÂfollow the following commands: Analyze> Classify> Hierar
chical Cluster
To put the variables in the same scale we standardize it by the method of proces
sing found in the dialog box. For dendogramms
6. Bibliography
CAZORLA, Irene M. Course packages. UESC. Ilheus. Aug 2003.
Ferreira, Armando M. SPSS - Instruction Guide. Agrarian School of Castelo Branco
. 1999.
PEREIRA, Alexandre. Practical Guide to Using SPSS. Data Analysis for Social Scie
nces and Psychology. 4th ed. Silabo editions. Lisbon. Mar 2003.
SANTANA, Cora. LISBON, Grace Basic Guide to SPSS for Windows. CPD / UFBA.
SPSS Inc. Statistical Analysis Using SPSS. Chicago. 2001
Wikipedia. SPSS. Available at: <http://pt.wikipedia.org/wiki/SPSS>.

SPSS Handout

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SPSS Handout

Uploaded by

Copyright:

Available Formats

Handout: Statistical Analysis using SPSS-Juliana Bahiense

Statistical Analysis Using SPSS command Practical Guide

You might also like