This project is intended to provide students in STA 6127 with a guide to using SPSS. It covers the material presented up through the first exam, including multiple regression, analysis of variance (ANOVA), and analysis of covariance (ANCOVA). The topics in this paper are presented in approximately the same order that they were introduced in the class, with examples taken from the homework problems and course web page to illustrate SPSS outputs. The instructions included in this guide are applicable to SPSS versions 9 and 10. Students using earlier versions may wish to consult the guide to using SPSS included in the appendix of the Agresti and Finlay text. In addition to the instructions provided in this guide, students should also note that the SPSS program provides both a topics and tutorial help function. The topics help function provides descriptions and instructions for most of the functions provided for by SPSS and can be accessed by selecting TOPICS from the HELP menu. The tutorial help function actually guides users through a limited set of SPSS functions, providing both visual and text instructions needed to complete a desired task. It can be accessed by selection TUTORIAL from the HELP menu.

** Reading Text
File**: Several of the homework problems require the use of
statistical
software to analyze and interpret data found on the course web page
(http://www.stat.ufl.edu/users/aa/ social/data.html). Rather than reentering this data manually, students can
use SPSS
to read this data into the program.
The
first step in this procedure is to save the material on the web page in
a text
format. To do this, select
the SAVE AS
option in the FILE menu of your web browser.
In the

Third, now you are
ready to read the text
file into the SPSS editor.
In order to
do this, open the SPSS program and click on the READ TEXT DATA option in
the
FILE menu. Select the text
file that
you just saved and click *Open* this will open the text import
wizard. In the first frame,
you are
being asked whether you would like the data read according to a
predefined
format. Since you have not
already
created a format to read the data, select *No* (the default
selection) and
click *Next*. In the
second frame,
you are being asked whether the data in your text file is delimited or
fixed
width. Delimited means that
the data is
separated by a common character, such as a comma, tab, or space. Fixed width, by contrast, means
that the
data is arranged in rows and columns, without a common character
separating
them. The web data is
arranged in this
latter format. Thus, you
should click
on the *Fixed Width* option.
If
you kept the variable labels in your text file, you should also select
*Yes*
in the *Variable Names Include At Top of File* option and then
click *Next*. In the third frame, the default
options
already selected are appropriate for reading the web data. This frame asks you to identify
the line
location where the data begins (which is line #2 if you opted to include
variable labels), the number of lines per case (in this instance 1), and
the
number of cases you would like read into the SPSS editor. The fourth frame asks you to
identify the
column divisions between the data.
Move
the lines so that the data for each variable fits between them. Make sure to screen your data
before moving
on to the next frame, as some of the data further down in the columns
might
cross one of the lines (and thus would be treated as two
variables).

In the fifth frame, you
have the option to
change your variable labels.
You should
note that some of the variable labels used in the web data sets are
reserved by
SPSS (i.e. won't allow you to use them) if this is the case, the
variable
labels applied by SPSS are likely incomplete and may be in the wrong
place. It is, therefore, a
good idea to
check all of your variables to ensure that they are properly
labeled. You should also check to ensure
that SPSS
has made the correct choice in interpreting your data as either string
(i.e.
text) or numeric. Both of
these checks
can be done either in the fifth frame of the conversion process or in
the SPSS
editor itself in my personal experience, I have found it easier to do
the
latter. Finally, the sixth
frame asks
whether you would like to save this format for the future. Clicking *Finish* reads the
data into
the SPSS editor. Make sure
to check
over your data, as unintended spaces in your text file may result in an
added
variable or case that needs to be deleted.
Also, you can change variable names and text format by clicking
on the
VARIABLE VIEW tab at the bottom of the page.
Now the real fun begins!!

** Multiple
Linear Regression** (no interaction): For multiple linear
regression in
which both the response and explanatory variables are quantitative,
choose
REGRESSION from the ANALYZE menu and select the LINEAR suboption. Enter your response variable in
the

An example of an SPSS
regression printout
is included in Appendix I.
This example
uses the data for question 5 in chapter 11 of the Agresti and Finlay
text. Most of the data output should
appear
familiar to the student in this course; however, there are a few notable
differences from a SAS printout.
The
=ANOVA table presents *F* and *p* values for the
regression
model. The Regression Sum
of Squares
in the SPSS printout is the same as the Model Sum of Squares in
the SAS
printout. Likewise, the Residual Sum
of Squares in the SPSS printout is the same as the Error Sum of
Squares in
the SAS printout. In the
=Coefficients
table, the Beta values represent standardized coefficients.

Partial correlation
values can be obtained
as part of the regression analysis (by checking the *Part and Partial
Correlations* option in the *Statistics* option box) or
separately by
choosing CORRELATE from the ANALYZE menu and then selecting the PARTIAL
suboption. Using the latter
method, the
student should enter the two variables for which a partial correlation
is being
sought in the *Variables* box and the control variable(s) in the
*Controlling
For* box.

** Multiple
Linear Regression** (with interaction): SPSS is not as
user-friendly as
SAS in dealing with interaction between explanatory variables in a
multiple
regression model. Two
approaches,
however, can be employed within SPSS for addressing interaction. The first approach requires the
user to
construct the interaction variable(s) within the SPSS data editor. This can be accomplished by
selecting the
COMPUTE option within the TRANSFORM menu.
The student should label the interaction term in the

A second method for
analyzing interaction
within a multiple regression model requires the student to use the
general
linear model function rather than the multiple linear regression
function
within SPSS. This second
method is well
suited for addressing multiple interaction terms, but presents output
data in a
slightly different form than the multiple regression method and offers
fewer
options for data analysis.
Click the
GENERAL LINEAR MODEL (GLM) in the ANALYZE menu and select the UNIVARIATE
suboption. Enter the response variable into
the *Dependent
Variable *box and the explanatory variables into the
*Covariate(s)*
box. The student should note
that the
UNIVARIATE GLM function within SPSS can also be used for ANOVA and
ANCOVA, for
which quantitative variables are always entered in the
*Covariate(s*) box
and fixed, qualitative variables are entered in the *Fixed
Factor(s)*
box.

In order to test for
interaction, click on
the *Model* box and select the option labeled *Custom*. Under the *Build Terms*
arrow select
the option labeled *Interaction*.
Select the terms that you would like included in the model by
highlighting them in the *Factors and Covariates *box and then
using the
arrow button to move them to the *Model* box. Students should note that interaction terms are added to
the
model by highlighting two or more variables in the *Factors and
Covariates *box
and then clicking the arrow.
The *All
Two Way, All Three Way, *etc. options can also be used to facilitate
the
construction of interaction terms.
Once
you have finished entering the main effects and interaction variables
into your
model, click *Continue* and return to the main dialog box. In order to display parameter
estimates for
the model, select the *Options* box and check the *Parameter
Estimates*
option. Click
*Continue* to return
to the main dialog box and then *OK* to perform the regression
analysis.

An example of an SPSS
output for a
multiple linear regression model with interaction is included in
Appendix
II. This example uses data
from
question 13 in chapter 11 of the Agresti and Finlay text. Again, the SPSS product differs
somewhat
from its SAS counterpart, though most of the information should be
recognizable
to the student in this course.
The
Corrected Model Type III Sum of Squares is equivalent to the
Regression Sum
of Squares in the SPSS multiple regression output and the Model
Sum of
Squares in SAS. Likewise,
the Error
Type III Sum of Squares is equivalent to the Residual Sum of
Squares in the
SPSS multiple regression output and the Error Sum of Squares in
SAS. The Type III sum of squares for
the
interaction term (and the *F* and *p* values calculated from
it) are
used to test the null hypothesis that there is no interaction
occurring. In the case of a single
interaction term,
the *t* value for the interaction term can also be used to test the
null
hypothesis of no interaction.

** One-Way ANOVA**:
For a one-way ANOVA in which there is a quantitative response variable
and a
single qualitative explanatory variable, click COMPARE MEANS in the
ANALYZE
menu and select the ONE-WAY ANOVA suboption.
Place the quantitative response variable in the

** Factorial
ANOVA and ANCOVA**:
SPSS uses the
same GLM univariate procedure for handling both factorial ANOVA and
ANCOVA. This is also the
same procedure
used for multiple linear regression with interaction, and thus the
student may
wish to review that section of this paper.
Factorial ANOVA pertains to models with a quantitative response
variable
and two or more qualitative explanatory variables. ANCOVA is used with models that have a quantitative
response
variable, one or more quantitative explanatory variables, and one or
more
qualitative explanatory variables.
To
perform a factorial ANOVA or an ANCOVA, click on the GENERAL LINEAR
MODEL
option in the ANALYZE menu and select the UNIVARIATE suboption. Place the explanatory variable in
the

As already described in
the section on
multiple regression with interaction, interaction effects can be
evaluated by
clicking on the *Model* option in the main dialog box, selecting
the *Custom*
suboption, and then adding main effects and interaction variables into
the
model to be tested. By
clicking on *Options*
in the main dialog box, the student can select options for
*Descriptive
Statistics* and *Parameter Estimates*.
The student can also create Bonferroni or LSD confidence
intervals for
the categorical variables.
To do this,
select the qualitative variable for which a confidence interval is
desired and
move it using the arrow key to the *Display Means For* box. Check the *Compare Main Effects
*box
and select the preferred confidence interval measure from the drop-down
window. Indicate the desired
confidence
level in the *Significance Level* box below. Click *Continue* to return to the main dialog box
and then *Ok*
to perform the statistical analysis. An
example of an SPSS output for an ANCOVA without interaction is included
in
Appendix IV. The data for
this output
was taken from Table 13.1 of the course web page, with income (in
thousands of
dollars) being the response variable, education the quantitative
explanatory
variable, and race the qualitative explanatory variable.

** A
Note on
Dummies**: In dealing with qualitative, categorical data, the
student
should be careful to observe how the data has been entered into the SPSS
editor. Categorical data can
be entered
into the SPSS editor using one of two methods.
First, categorical data can be entered into the SPSS editor as a
single
variable. For example, the
variable
race might entered into a single column within the SPSS editor,
with the
values black, hispanic, and white used to indicate the
race of the
respondent in the cells within that column.
If categorical data has been entered into the SPSS editor in this
manner, then the student should treat this variable as qualitative for
purposes
of conducting statistical analyses with SPSS.
Thus, the student conducting a one-way ANOVA should use the
one-way
ANOVA procedure outlined above rather than the linear regression
procedure. Likewise, in
using the
general linear model procedure to conduct a factorial ANOVA or an
ANCOVA, the
student should enter the categorical data in the

Second, categorical
data can also be
entered into SPSS using dummy variables.
In this instance, the categories of the qualitative variable
would be
divided up into separate column variables (e.g. a separate column
variable
would be created for black and hispanic ), with a 1 or
0 used to
indicate whether the respondent exhibited the column
characteristic. If the data for a given
qualitative variable
has been entered into the SPSS editor using dummy variables, then the
student
should treat the set of dummy variables as quantitative data. Thus, the student can use the
multiple
linear regression procedure to examine the influence of the dummy set on
the
explanatory variable. Or if
the general
liner model procedure is used, the student should enter the dummy
variable(s)
in the *Covariate(s)* box.
The
student should recall that the final category of a qualitative variable
is not
needed for conducting statistical analyses using dummy variables thus
this
category will likely be omitted in the SPSS data editor and should be
omitted
when entering categorical variables into the model using either the
multiple
regression or general linear model procedures.

In order to better
understand the
different methods for entering categorical data into the SPSS data
editor, it
might be helpful to examine Table 13.1 on the course web page. In this example, data for the
variable
=ethnic group has been entered twice.
First, it was entered as a single variable under the column
heading
=race. Second, it was
also entered as
separate dummy variables. A
dummy
variable (=z1 ) was constructed to identify respondents who were
black. A second dummy variable
(=z2 ) was
constructed to identify respondents who were Hispanic. Of course, respondents who scored
a zero on
both dummy variables would be classified as white. Appendix V presents an example of a multiple regression
output
using dummy variables. This
example
uses the same data set as in Appendix IV but uses the race dummy
variables
rather than the race single category variable.
The student should note that the parameter estimates, *F
*value, *p*
value, and individual *t* values are the same for the two
outputs.

** Repeated
Measures**: In order to conduct a repeated measures ANOVA, click
on the
GENERAL LINEAR MEASURES option in the ANALYZE menu and select the
REPEATED
MEASURES suboption. In the

** Resources**:

Agresti, Alan and Barbara
Finlay. 1997. *Statistical Methods for the Social Sciences*. 3rd Edition. Upper Saddle River, NJ: Prentice Hall. Pp. 658-666.

Martinez, Michael. 2000. Conversation with Author. Department of Political Science. University of Florida.

SPSS Inc. 1999.
*SPSS Base 10.0
Applications Guide*.
Chicago, IL:
SPSS Inc. Pp. 117-213.