JESSICA A. R. LOGAN
  • Home
  • Publications
  • Projects
  • Blog

On My Mind

Power Struggles

11/11/2019

3 Comments

 
I've been listening to the excellent podcast "quantitude" with Greg Hancock and Patrick Curran, which if you haven't done yet you should check out. In Episode 2, (Power Struggles). Patrick made the (hyperbolic) statement that all power analysis is useless.  Lest you think I'm exaggerating, his exact quote is: "I think all power analysis is useless." 

I was listening while washing dishes and ran across the house with wet hands to find a pen to write this quote down, because I am a power analysis believer. It's true that it can sometimes be fuzzy, and maybe more effort goes into them than is necessary, but ultimately I believe they are a good tool. I've broken down Greg and Patrick's arguments against a-priori power analysis into three basic parts:

1) Some models are so complex that there is no one pathway that represents the "effect" you need to power

They argue, essentially that if you're running a complex model that Cohen's ideas about what an effect size represents doesn't even really apply. If you have a complex multi-indicator latent factor model, there are too many pathways to consider."Take a simple growth model with five indicators... what is the power of that?? Are you interested in the intercept, the slope, the correlation between the two... "

When I'm running power analyses, I am often planning an analysis with complex, multi-indicator latent factor models, with students nested in classrooms. Sometimes there is a planned missing data design also housed within that latent model. These are extremely complicated. But in the end, my research questions, my hypotheses, can almost all be confirmed or refuted based on the statistical significance of a particular pathway (or pathways). If I can't, then I need to go back and re-state and re-think my question. 
​
2) Power is specific to the analysis you want to run

Yes. This is why the specificity is even more important. Do you want to know if third graders grow in their language skills less than second graders do? Then you need to fit a growth model, and you need to estimate the power you have to detect the pathway that represents that difference. Maybe that's a predictor of the latent slope. You can power for that. Maybe it's a multiple group model, and you'll constrain to see if the slope factor should be forced to be equal across your groups or allowed to vary. You can power for that too. 

I do this with simulations. Where the values of those simulations are seeded by pilot work or other large scale studies that have used the same or similar measures. I use the variance almanac to determine the intra-class correlation due to schools (when that is relevant). I use this delightful article to determine whether my effect sizes are meaningful or important or at all interesting. 

3) Power is specific to features of the data: "Power depends on the communality estimates; the multiple R-squared. It increases the higher your R-squared is for your indicator" 

Agreed, and this is why I always present a power analysis under multiple scenarios. I will always estimate my power to detect the critical pathway for a given hypothesis both with and without covariates included, for different levels of attrition, for different key variables of that particular construct. 

And so... 
To sum up, I do agree that running a power analysis for an entire model is useless.  But it is useless because if the only power analysis you can think to run is one for the model itself, then you probably don't have a very well defined question. For any one study, I will report two to three power analyses for each and every hypothesis. 

I think our differences in opinion can be boiled down to differences in funding mechanisms. The NIH gives you one paragraph, while IES, who I write most of my grants for, expects something closer to a page with a table dedicated to the power analysis. It's also different because I have the luxury of frequently working with randomized control trials. Usually a primary aim of the study is to determine whether a treatment group is different from a control group. 

Don't get me wrong. A lot of my time is spent on power analysis. If Patrick and Greg can convince the federal funders to drop power, or to switch to an emoji-based system, I could learn to knit or something with all of the extra time I would have on my hands. But for now, if you're in need of a power analysis, try mapping your research question on to the actual equation or latent path model you intend to run. Where in that equation or on that diagram could your hypothesis be disproven? If you don't know, try writing a better, more specific question. 

3 Comments
Gregory R. Hancock link
11/12/2019 05:56:49 am

A priori power analysis, even for the simplest designs, is still a house of cards
------------

That a priori power analysis is uncertain is understood and generally acceptable, albeit uncomfortable; how uncertain it actually is, however, can be downright foundation rattling.

Here’s a way to think about it. Imagine that your world is one where everything is answered by a two-sample t-test. Indeed many RCTs boil down to the equivalent, or a variation, of this. So you set your Cohen’s d and you’re on your way to determining the necessary sample size. But…

* There is actually a range of possible true d-values so maybe put an interval around those. More completely, there is a range of possible unstandardized mean differences so put an interval around them, and each population has range of variances so put separate intervals around each of those.

* There is a range of possible distributional shapes so maybe put an “interval” around those, and for each population separately.

* There is a range of possible ICCs (for such designs) so put an interval around those.

* There is a range of possible reliabilities of your outcome variable so put an interval around those.

* Got covariates? There is a range of values for the influence of each covariate so put intervals around those. They might covary to different degrees, so put intervals around those. The assumption of parallelism could be violated to different degrees so put intervals around those; also, the assumption of linearity of the relation of each covariate to the outcome might be violated to different degrees, so put intervals around those.

* There is a range of possible degrees of missingness so put an interval around that, as well as around the assumptions of the missingness processes.

* And so on. Lather, rinse, repeat.

Now imagine a simulation that chooses values from each of the above intervals, determines the necessary sample size, over and over and over, until the whole space is mapped out. This underscores how the necessary sample size isn’t a value, but rather a space, and a space that is typically *highly* varied. Alternatively, pick a fixed sample size (e.g., one that you can afford) and estimate the power throughout this space; again, there is generally *quite* a range. It’s downright unsettling. Practically, then, this means you should increase sample size until you’re covered across the space (and not out of money); we think that’s a great idea, and encourage folks to do so.

The above is about the simplest case scenario. So now imagine a world where multiple research questions are embedded within a larger model, as they virtually always are in the grant applications that I routinely review for IES, NIH, and NSF. While each question might attach to a specific parameter, those parameters are typically intertwined and their power asymptotically dependent as a result. This larger model contains focal parameters (those associated with your research questions), and peripheral parameters (those not associated with research questions but setting an important context, e.g., factor loadings in a model where latent structural relations are of interest). But there is also has a lot of other stuff defining the environment: nonlinearity possibilities, distributional possibilities, data structure possibilities (e.g., through ICCs), missingness possibilities, etc.. In the end, the idea of a single power or sample size value is wrong both because there are multiple focal parameters as well as because of the whole space of possibilities around those parameters.

In the end, I land on the position that power is at best a very coarse reckoning. That does not mean one should avoid thinking about it, or avoid walking the space, or throw up one’s hands and make no effort. It’s about understanding and potentially mapping what reality could be, how far it could be from where you think you are standing, and working within that space as you plan.

I hope that helps formalize the position I intended to convey, which perhaps had a larger interval around it than was intended.

Good luck!
:)
Greg

Reply
Dan
11/12/2019 11:20:06 am

This was a really great post. And your point: "But it is useless because if the only power analysis you can think to run is one for the model itself, then you probably don't have a very well defined question" was particularly insightful.

Do you have a paper that you could share that shows an example of this: "For any one study, I will report two to three power analyses for each and every hypothesis"?

Best,
Dan

Reply
Barry Jacobs link
11/9/2022 04:44:41 am

Building relate scene. Wrong senior least movement ask six second.
Contain sense blood what finish feel meet. Drug free thing lot. Plan along old election.

Reply



Leave a Reply.

    Archives

    June 2022
    August 2020
    July 2020
    February 2020
    January 2020
    November 2019
    October 2019
    September 2019

    Categories

    All

    RSS Feed

Powered by Create your own unique website with customizable templates.
  • Home
  • Publications
  • Projects
  • Blog