A variable creation question

6/20/2022

I recently received a question from a former student that is relevant to much of the way variables are created in education and developmental science, so I wanted to share it here.

Question:
"In my dissertation study, participants were asked four questions (indicators) where the response options were yes = 1 and no = 0. For data analysis, I summed them together to create a measure ranging from 0 to 4. For ease, I've included the four questions here: "Was there a time in the past 3 months when you or someone in your household:

Did not pay the full amount of the rent or mortgage because you could not afford it?
Skipped paying a bill or paid a bill late due to not having enough money?
Needed to see a doctor or the hospital but did not go because you could not afford it?
Could not fill or postponed filling a prescription because you could not afford it?

Some members of my dissertation committee are pushing back on the idea of creating a sum score. Is it OK to use this variable in a regression analysis?"

Answer:
There is no hard or fast rule about what you can and can’t do to variables, because data don't know where they come from. However, you do need to consider what you have done to these variables and how that distorts or represents reality. Essentially, by summing the variable you’re assuming a few interrelated things: 1) Assuming that each of these indicators has equal weight or is equally important, 2) assuming that “one more” of these indicators means the same thing regardless of which indicator it is, 3) assuming that the difference between someone having endorsed zero and 1 indicator is the same as the difference between having endorsed 2 and 3 indicators, and 4) assuming that they all qualitatively represent the same construct.

There are many reasons why any of those four concepts may not be true. However, any time we take a measurement of something we simplify reality into some sort of single dimension. Your job as the expert is to figure out how that measurement translates back to reality. What I suggest here is that you write a section into your discussion about how creating the variable in this way may be different from the reality you are trying to represent. Acknowledging that this is the case, and thinking deeply about what it means for your research findings, helps to fill your responsibility as a data analyst and expert in the field.

1 Comment

Norm Matloff link

6/22/2022 12:08:21 am

This is a really good writeup. Should be required reading in any applied stat course. One point I would add is that that "single dimension" may be worthwhile even if the precepts behind it are not so valid, as it may play a much-needed role in dimension reduction.

On My Mind

A variable creation question

Leave a Reply.

Archives

Categories