#Create dummy variables in spss 16.0 code#
You should see that in addition to the five ethnicity categories, there is a category called “Not answered,” which is labelled as “-9.00.” Because we are interested in the influence of ethnicity on GCSE score, these unanswered cases are effectively missing data, so we can code “-9.00” as missing. Click on the Values cell in the s1eth2 row to view the values assigned to our variable. However, before we begin, let’s check the value labels for s1eth2, to make sure this variable is ready for analysis. This is done in much the same way that we created the dummies for sex.
To avoid error, we’re going to create dummy variables for s1eth2. As we found with s1gender, using s1eth2 in a linear regression without changing the coded values of the categories would give us results that would not make sense. However, because linear regression assumes all independent variables are numerical, if we were to enter the variable s1eth2 into a linear regression model, the coded values of the five categories would be interpreted as numerical values of each category. Much like with sex discussed above, the codes 1, 2, 3, 4, and 5 assigned to each ethnicity do not represent anything – the order is arbitrary. In the YCS dataset, the variable s1eth2 has five categories (1=White, 2=Black, 3=Asian, 4=Mixed, and 5=Other). Suppose you want to see how the total GCSE score of the respondents is related to their ethnic group. However, many variables have more than two categories.
We’ve learned that variables with just two categories are called binary variables and are simple to use in regression.
Does ethnicity influence total GCSE score?