FYI: 注意p-value的誤用
Published by 劉正山,
FYI: 注意p-value的誤用 |
And yet, wonder of wonders, the American Statistical Association has finally taken a position against p-values. I never thought this would happen in my lifetime, or in anyone else’s, for that matter, but I say, Hooray for the ASA!
To illustrate the problem, consider the one of the MovieLens data sets, consisting of user ratings of movies. There are 949 users. Here is an analysis in which I regress average rating per user against user age and gender:
> head(uu)
userid age gender occup zip avg_rat
1 1 24 0 technician 85711 3.610294
2 2 53 0 other 94043 3.709677
3 3 23 0 writer 32067 2.796296
4 4 24 0 technician 43537 4.333333
5 5 33 0 other 15213 2.874286
6 6 42 0 executive 98101 3.635071
> q summary(q)
...
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.4725821 0.0482655 71.947 < 2e-16 ***
age 0.0033891 0.0011860 2.858 0.00436 **
gender 0.0002862 0.0318670 0.009 0.99284
...
Multiple R-squared: 0.008615, Adjusted R-squared: 0.006505
Woohoo! Double-star significance on age! P-value of only 0.004! Age is a highly-significant predictor of movie ratings! Older people give higher ratings!
Well, no. A 10-year age difference corresponds to only a 0.03 difference in ratings — quite minuscule in light of the fact that ratings take values between 1 and 5.
The problem is that with large samples, significance tests pounce on tiny, unimportant departures from the null hypothesis, in this case H0: βage = 0, and ironically declare this unimportant result "significant." We have the opposite problem with small samples: The power of the test is low, and we will announce that there is "no significant effect" when in fact we may have too little data to know whether the effect is important.
In addition, there is the hypocrisy aspect. Almost no null hypotheses are true in the real world, so performing a significance test on them is absurd and bizarre.
Speaking of hypocrisy: As noted above, instructors of statistics courses all know of the above problems, and yet teach testing anyway, with little or (likely) no warning about this dangerous method. Those instructors also do testing in their own work.
p-value的誤用 |