1. What is a reverse-scored variable, and what purpose does it serve?
A reverse-scored variable is a question where the response categories are flipped, or reversed, as compared to most of the other question. This is done with a semantic differential scale in order to avoid the halo effect. With a Likert-type scale, the reversing is done to detect or avoid naysaying or yeasaying.
When you recode a variable using the "into the same variable" option, what happens to the original data values? What purpose do you think the "into different variable recode" option serves?
With "recode into the same variable," the original data values are renumbered according to the recode instruction, and the original variable name is unchanged. With a "recode into a different variable" option, a new variable is created, and it contains the recoded values. This option preserves the original variable in its original raw, unreversed form.
3. How does a computed variable differ from a recoded variable?
A computed value is one that is created by a computation such as x = y + z where y and z are variables in the data set, and x is the newly created variable. A number of different computational operations are possible in the compute function of SPSS. A recoded variable is one where the codes have been altered by identifying the old number and specifying the new code number one number at a time in the recode window.
4. A researcher has coded all instances where respondents have indicated "do not know" or "refuse to answer" with the code, "99." How can you use the recode operation to cause SPSS for Windows to recognize that 99 stands for missing data?
There is a "system missing" option in the recode variable template. The researcher would engage the recode operation, specify the relevant variables, enter 99 in the Old Value box, and select "System-Missing" for the New Value. Click on Add and then Continue to complete the recoding operation.
5. Below are the paired ratings for Red Lobster and Jake's Seafood Restaurant for 10 respondents for each of three different restaurant features. (For proper spacing, make your window full size.)
(Refer to Test for data.)
The 7-point ratings were obtained on a questionnaire that instructed respondents to rate each restaurant from low-to-high prices, fast-to-slow service, and wide-to-limited menu. Input these responses into an SPSS for Windows data set.
a. Recode the service and menu ratings so they are negative-to-positive.
Students should use the recode into the same variable operation.
b. Compute average overall ratings for Red Lobster and Jake's Seafood Restaurant using the three restaurant attributes of: prices, service, and menu.
c. Compute the difference between the average for each restaurant for each respondent.
Inspect your computed differences across all ten respondents, and indicate which restaurant has a better overall perceived image.
Here is the data set with recoded variables, totals and difference (diff = redlob-jake). The differences are all positive or zero, so Red Lobster has the higher average rating in most cases.
IDNUM |
RLPRICE |
JPRICE |
RLSERVE |
JSERVE |
RLMENU |
JMENU |
REDLOB |
JAKE |
DIFF |
1 |
2 |
3 |
5 |
4 |
4 |
3 |
11.00 |
10.00 |
1.00 |
2 |
3 |
5 |
4 |
1 |
4 |
2 |
11.00 |
8.00 |
3.00 |
3 |
4 |
5 |
3 |
2 |
6 |
4 |
13.00 |
11.00 |
2.00 |
4 |
5 |
4 |
2 |
4 |
5 |
4 |
12.00 |
12.00 |
.00 |
5 |
4 |
6 |
5 |
2 |
5 |
2 |
14.00 |
10.00 |
4.00 |
6 |
5 |
4 |
6 |
4 |
4 |
3 |
15.00 |
11.00 |
4.00 |
7 |
6 |
2 |
4 |
3 |
2 |
5 |
12.00 |
10.00 |
2.00 |
8 |
5 |
5 |
5 |
3 |
5 |
6 |
15.00 |
14.00 |
1.00 |
9 |
6 |
2 |
2 |
5 |
5 |
3 |
13.00 |
10.00 |
3.00 |
10 |
3 |
5 |
5 |
2 |
6 |
3 |
14.00 |
10.00 |
4.00 |
1. If you do not use the Select Cases command, what cases in your SPSS data set are used in any subsequent analysis or graphs?
The default is that all cases will be used unless specific types of cases are specified by the Select Cases command.
2. How many different types of Select Cases options are there, and what are they?
There are 5: (1) all cases, (2) if condition is satisfied, (3) random sample of cases, (4) based on time or case range, and (5) use filter variable.
3. How would you indicate that you wanted to select all cases where AGE was equal to or greater than 25 years?
Use the "if condition is satisfied" option, and click on If… Next, in the Select Cases If window, select age as the target variable and write the expression age>25 in the target window. Click on continue and okay to return to the data editor.
4. Is it possible to select cases satisfying multiple conditions without keystroking the word, "and" between the conditions?
The word "or" could be used, but the point is that a word has to be inserted in the condition.
5. Use the NFO panel data set to select only those cases from REGION number 2 and owners of JCPENNY (=1) credit cards. How many cases satisfy these conditions?
The way to do this is to select with region = 2 AND jcpenny =1, and delete the unselected cases. Scroll down the data editor to the end of the rows. There will be 25 cases.
6. Of those identified in question 5, how many satisfy the condition of a maket size (MKTSIZE) of 1?
Sort by mktsize in ascending order, and scroll up to see how many 1’s there are. There are 2.
1. How many different help subcommands are there under the SPSS Help command?
There are 7: topics, tutorial, SPSS home page, technical support, statistics coach, Ask Me, and About (SPSS).
2. Indicate how to search for help using the "Ask Me" function.
In the Help Topics SPSS for Windows window, type in the topic or question, and click on "search." Next, select a subtopic from the list that is generated by the search operation, and click on "display." Read information in the resulting display window.
3. What is the purpose of the "How To" button in a help display window?
"How To" lists the specific steps, keystokes, and other operations necessary or associated with the execution or accomplishment of the topic being researched.
4. Indicate how SPSS tutorials operate.
Select Help-Tutorials. Click on the tutorial topic you wish to activate and view the tutorial on your computer screen using the Previous and Next buttons to advance or go back. Tutorials are static screen captures of SPSS windows with annotations and large red arrows to point out features.
1. What are the "central tendency" statistics options available in SPSS under the Frequencies command?
Mean, Median, Mode, Sum
2. When you open the statistics options panel under the Frequencies command, all statistics check box are blank. Why do you think this is the default?
Frequencies can be performed on any level of scale. It is the user’s responsibility to understand the scaling assumptions and proper central tendency, variability, and other statistics of the variable being analyzed.
3. The mode was determined to be what number for mlpurch and bestvalu? Verify that these are correct by indicating the valid percent in each case.
In both cases, the mode was found to be a 1, and it corresponds to a valid percent of 50.4 for bestvalu and 67.0 for mlpurch.
4. For the age of the respondents in the Celluar One survey, the median was found by SPSS to be 3. Verify that this is the correct median.
The median is identified in the output table as a 3, and the cumulative percent for the third age category is 54.8%.
5. With the descriptives command, the mean, standard deviation, minimum, and maximum check boxes are checked when the options window opens. Why do you think this is the default?
Descriptive are intended for interval or ratio scales, and these descriptive statistics are appropriate with these higher level scales.
6. The mode and the median are not provided as Descriptives Options. How can you determine the mode and median of the ages of the respondents in the Cellular One survey?
Either use the Frequencies option with the mode and median feature, or use a frequency distribution table to find the highest response category valid percent for the mode and the location of the 50% split point with the cumulative percent.
7. Indicate whether you should use Frequencies or Descriptives to caclulate the appropriate descriptive statistics for each of the following variables.
a. Gender
Nominal scale, so use Frequencies
b. Type of dwelling
Nominal scale, so use Frequencies
c. Take home pay (exact amount)
Ratio scale, so use Descriptives
d. Favorite brand of diet cola
Nominal scale, so use Frequencies
e. First, second, and third choice of pain reliever
Ordinal scale, so use Frequencies
f. Satisfaction with customer service on a 7-point scale
Interval scale, so use Descriptives
1. What is a one-sample t-test all about?
It is a procedure in SPSS for Windows that either computes confidence intervals for a mean or tests a hypothesis about a mean.
2. When the test value in a one-sample t-test is set to zero, what type of statistical test will result?
By default, a 95% confidence interval will be computed for the mean.
3. If you wanted to determine the 95% confidence intervals for 10 different variables in a data set all in the same SPSS operation, how would you go about doing this?
Use the Compare Means-One-Sample T Test command sequence to open up the One Sample T Test Window. Select all 10 variables and identify them in the Test Variable(s) location. Leave the Test Value at 0 and click on OK.
4. What four pieces of statistical information are contained in the SPSS One Sample Statistics output table?
N, Mean, Standard deviation, and Standard error of the mean
5. When obtaining a confidence interval for a mean with SPSS, what is the function of the significance level reported in the output?
With the target value set to 0, the mean is tested against the hypothesis that it is zero. The reported significance level is meaningless unless a hypothesis test is called for.
6. When performing a hypothesis test for a mean, what is the test value?
It is the hypothesized mean.
7. What is the "Mean Difference" in a one-sample t-test?
It is the Mean minus the Test Value.
8. When performing a hypothesis test for a mean with SPSS, what is the function of the significance level reported in the output?
The significance level is an indication of the support for the hypothesis. If it is very low, say less than .05, the hypothesis is rejected. If larger than .05, the hypothesis is not rejected. (This explanation assumes 95% confidence level, of course.)
9. If you tested a mean hypothesis that was supported by your sample data, what value would necessarily be included in the 95% confidence interval of the difference?
The Mean Difference would be included and exactly in the middle. With a test value of 0, the actual mean is exactly in the middle of the Lower and Upper levels.
1. There are two different situations where 2 means can be examined for statistically significant differences. Identify and describe each case.
One instance is with two independent groups or samples such as men versus women, buyers versus nonbuyers, young versus old, etc. The other case is when the comparison is between the two variables or questions in the same group such as how friendly versus how fast the service is at a grocery store.
2. With an Independent Samples T-test, how many groups are involved, and why are the groups considered "independent?"
Two groups are involved, and they are "independent" because they are mutually exclusive or distinctly different groups such as married people versus single people.
3. How do you designate the group identities in an Independent Samples T-test?
Put the variable name in the Grouping Variable location of the Independent Samples T Test window and click on Define Groups. Place the identification code number for group 1 and group 2 in the appropriate boxes, and click on Continue.
4. If you had an SPSS data set with a variable, USERTYPE, measured as NONUSER (=1), LIGHT USER (=2), and HEAVY USER (=3), how would you go about comparing the means of each user type to other user types for the variable, HOWMUCH?
Use the Independent Samples T Test procedure to identify howmuch as the Test Variable, and identify usertype as the Grouping Variable. Do three separate analysis with groups defined as 1 and 2, 1 and 3, and 2 and 3.
5. Indicate how to determine if the variances of two compared means are equal or unequal.
On the Independent Samples Test output, identify the column labeled, "Levene's Test for Equality of Variances" and look immediately under for the "Sig."column. This number indicates the amount of support for the null hypothesis of equal variances. Using 95% level of confidence, if the Sig is less than or equal to .05, the hypothesis of equal variances is not supported. The variances are therefore unequal.
6. What statistic on the output table indicates the level of statistical significance that has been found in an Independent Samples T-Test?
First, determine if the variances are equal or unequal. This determination identifies which row to use. Then, under the "Sig. (2-tailed)" heading, find the significance level in that row. Using 95% level of confidence, if the Sig level is less than or equal to .05, the hypothesis of equal means is not supported. The means of the two groups are significantly different.
7. How many groups are involved in a Paired Samples T-test? Explain.
Only one group is involved. There are two variables compared in each test, but the same group is used for each mean. That is, the same respondents must answer both target questions.
8. If a data set had the following variables, and you wanted to compare the means of the pairings listed below, indicate the steps you would take with SPSS for Windows to accomplish this end.
FRIENDLY - SPEEDY
ACCURATE - CHEAP
SPEEDY - ACCURATE
Use the Paired Sample T Test procedure, and identify the pairings above. With each move it by clicking on the pointer triangle into the Paired Variables box. Click on OK when done to obtain the statistical test results.
9. What is found in the first output table of a paired samples t-test?
For each variable involved, the variable label, mean, sample size, standard deviation, and standard error of the mean are all found in this first table.
10. Describe how you would determine if the means of two variables in the same data set are significantly different. Specifically, what output table would you inspect, and how would you determine the significance level?
Examine the Paired Samples Test table and look at the "Sig. (2-tailed)" occupying the last column. Using 95% level of confidence, if the value is less than or equal to .05, there is no support for the null hypothesis of no difference between the means of the two variables. That is, the means are significantly different.
1. What is the purpose of an ANOVA test?
Analysis of Variance allows for the simultaneous test of differences between means for instances where there are 3 or more groups.
2. When you perform a Means procedure in SPSS, what statistics are provided, and can you determine statistical significance?
For each group, the mean, sample size, and standard deviation are provided, but no statistical significance tests are provided.
3. With ANOVA, differentiate a dependent variable from a factor. Indicate the scaling assumptions of each.
A dependent variable is an interval or ratio scaled variable, while a factor is a grouping variable, typically a nominal scaled variable. The mean of the dependent variable will be determined for each level or group of the factor.
4. What is a Post Hoc Test, and how many are available with SPSS for Windows?
A Post Hoc Test refers to a presentation of the group means to be used after (or post hoc) the ANOVA results indicate the presence of significant difference(s) exist. There are 18 different Post Hoc tests identified in the Post Hoc Multiple Comparisons window.
5. How many output tables are provided with One-Way ANOVA?
If Post Hoc test(s) is identified, there are 2 output tables. One is the ANOVA table that flags statistical significance, and the other is the presentation of the Post Hoc output. (Of course, if several dependent variables are identified and several Post Hoc tests selected, the number of tables will increase accordingly.)
6. What is contained in an ANOVA table?
It contains 3 sums of squares, 3 degrees of freedom (df) numbers, 2 mean squares values, one F value, and one significance level value.
7. How does one interpret the significance level reported in an ANOVA table?
The significance level is found under the "Sig" heading. Using 95% level of confidence, if the Sig value is found to be less than or equal to .05, the null hypothesis of no difference between at least one pair of group means is not supported.
8. The RCA Heavy Metal CD sales ANOVA resulted in a significance level of .000. Does this number mean that there is absolutely no support for the null hypothesis?
No, SPSS only reports three decimal places here. If more were reported, there would be an integer found eventually. There is almost no support for the null hypothesis, not absolutely no support.
9. Describe how a Duncan's test presents significantly different group means.
The Duncan test table arranges the means from lowest to highest and groups them in columns indicating where the means are not significantly differently. Means that are found in different columns identify groups that are significantly different.
10. What is the default level of statistical significance in One-Way ANOVA Post Hoc tests? How can you change that level?
The default is .05 (95% level of confidence), and it can be changed as the Significance Level identified in the Post Hoc Multiple Comparisons window.
1. What are the appropriate scaling assumptions of crosstabulated variables?
Both are nominal variables.
2. Where is the "crosstabs" command found in SPSS for Windows?
Under the Analyze-Descriptive Statistics command.
3. What is a crosstabulation, and why is it used?
It is a simultaneous presentation of the frequencies of two nominal variables indicating how they occur together or apart. It is used to determine if a statistically significant nonmonotonic association exists between the two nominal variables.
4. Explain the following:
a. Row variable
The variable whose levels or categories constitute the rows in a crosstabulation.
b. Column variable
The variable whose levels of categories constitute the columns in a crosstabulation.
c. Row percentages
Percentages found in a crosstabulation tables cells such that they add up to 100% across a row, computed as the cell observed value divided by the row observed total.
d. Column percentages
Percentages found in a crosstabulation tables cells such that they add up to 100% down a column, computed as the cell observed value divided by the column observed total.
5. What are the "observed" counts in a crosstabulation table?
These are the raw frequencies or counts that indicate, for instance, how many times the row variable value of 1 and the column variable value of 1 were found in the sample of respondents.
6. What is reported in the Case Processing Summary table of a crosstabulation procedure in SPSS for Windows?
This table provides the number and percent of the sample size for valid, missing, and total cases used in the crosstabulation table.
7. Describe and explain the following in a crosstabulation table.
a. The column labeled "Total"
It designates the row totals and the grand total.
a. The row labeled "Total"
It identifies the column totals and grand total.
8. In the Chi-Square Tests table, how many statistics are reported if you select "Chi-Square" in the statistics dialog box of the crosstabs routine? Which one(s) are relevant to crosstabulation analysis (as it is described by your textbook)?
This table provides 3 different statistics plus the number of valid cases. The relevant one is the Pearson Chi-Square value and significance level.
9. How can you determine if a crosstabulation result is statistically significant? Explain what is meant by a statistically significant crosstabulation finding.
Using 95% level of confidence, if the "Asymp Sig. (2 tailed)" value in the Pearson Chi-Square row of the Chi-Square Tests table is less than or equal to .05, there is no support for the null hypothesis of no association between the two variables
10. Describe a procedure in SPSS that will help you to "see" a crosstabulation association.
Use a clustered bar graph that presents the two associated variables graphically.
1. What are the appropriate scaling assumptions of variables analyzed using Pearson Product Moment correlations?
They must be either interval or ratio scaled – essentially metric.
2. Indicate the appearance of a scatter diagram illustrating each of the following correlational relationships:
a. strong negative correlation
An ellipse that falls down to the right of an xy graph.
b. weak postive correlation
A barely identifiable ellipse the goes up to the right of an xy graph.
c. moderate negative correlation
An identifiable ellipse that falls down to the right of an xy graph.
d. almost no correlation at all
A formless mass of points across the xy graph.
3. How many different types of correlations are available under the SPSS for Windows bivariate correlation procedure? Name each type. Which one is the default?
There are 3 types: Pearson’s, Kendall’s tau-b, and Spearman. Pearson’s is the default.
4. Explain what is meant by a "correlation matrix" and explain why it is a symmetric matrix.
A correlation matrix is the form of the output of SPSS correlations procedure (and many other statistical programs as well). The matrix is set up with each variable as a column and as a row so the diagonal is the correlation of the variable with itself and always a 1.0. The matrix position i,j is the identical correlation of the matrix position of j,i.
5. What three types of information are present in Pearson Product Moment correlation output?
The output provides: ( 1) correlation(s), (2) significance, and (3) sample size.
6. What is the null hypothesis in correlation analysis, and how do you determine the degree of support for the null hypothesis?
The null hypothesis is that there is no correlation between the two variables under analysis (null hypothesis of zero correlation), and the significance level is identified in the Sig (2-tailed) portion of the Correlations output table. The sig value indicates the degree of support for the null hypothesis. Using a 95 level of confidence, if the sig value is less than or equal to .05, there is no support for the null hypothesis of zero correlation between the two variables.
7. If you were inspecting an SPSS correlation matrix, and you found a correlation coefficient without any asterisk(s) beside it, what would this signify?
It signifies, that the correlation is not significant at a the .05 level or less. If the "Flag significant correlations" box is checked (the default) in the variables selection window of the correlations procedure, those correlations is significant at the 0.01 level or less (2-tailed) will have 2 asterisks, while those significant at the .05 level or less will have one asterisk.
8. Why does SPSS for Windows report the sample size for every pair of variables for which it computes a correlation? That is, why not just report the total sample size?
SPSS uses only those cases where there is no missing data for either variable. Since each corrrelated variable pair is unique, the sample size is unique, and SPSS reports how many cases were found with no missing data. In the Burroughs dataset, there is no missing data, so all of the sample sizes are 20.
Suppose you found a correlation of .546 between sales and sales force size with a significance level of .95. How would you interpret this finding?
The significance level indicates no support for the null hypothesis, so the population correlation is 0.0. (One must always assess the level of support before interpreting the correlation coefficient reported in the output.)
10. Is it correct to claim that its higher prices in some territories caused Burrough's sales to be lower in those territories? Why or why not?
No, it is not correct. Correlation does not equal causality.