Saturday, October 14, 2006

Chi-Square/Be sure to use logic in any statistic application.

With the recent furvor over another Lancet study about the deaths in Iraq, there is some interest in Chi Square as a statistical tool. Since I have not seen the complete report with the raw data and no one will be able to confirm the raw data, we may never be able to fully look into this 'study'. But I was recently shown a link about Chi-Square from above title link.

Just a quick look at whose resume this lesson plan was created by, shows that he graduated with a M. Ed. in progress, to be conferred May 1997 from U. of Illinois, Urbana-Champaign. Major: Curriculum and Instruction.

Most of the lesson seems straight forward and somewhat interesting, but I wanted to address his choice and presentation of "Accidents at Irongate". First let me present his question and data:
The Irongate Foundry, Ltd., has kept records of on-the-job accidents for many years. Accidents are reported according to which hour of an 8-hour shift they happen. The following table shows their accident report.


The union at the foundry wants to know whether accidents are more likely to take place during one hour of the shift rather than another. They are asking you what you think.

Do you think that more accidents are likely to take place during one hour of a shift over another?
If you said yes you would be hard pressed to prove yourself at this point, but continue on and be amazed!

The first question that comes to mind is why use Chi Square (at least on this experiment)? Well the obvious reason not to use Chi Square is that hours worked during the day is not an "independent, normally distributed random variable"(Chi Square). It clearly is not an IRV (Independent Random Variable) that you can not assume one roll (hour) is not influenced by another roll (hour).
Well let us go on.

Does this histogram help in discerning when more accidents tend to happen?

Actually it does. By plotting a line through the majority of the points then yes it looks like a slight upward slope to the right. This would mean that more accidents tend to happen in the later part of a shift than in the earlier part of a shift. And again we have an easy theory that as people become fatigued they tend to have more accidents.

Now of course this question is located in an educational book:
based on Ch 6 of Using Statistics
by Travers, Stout, Swift, and Sextro.

And we see that another class used the question also at Cognitive Science 14 Practice Final Answers. And I believe the question may also be in Howell, D. C. (2002). Statistical Methods for Psychology (5th ed). Duxbury.
And then this brings up the expected accidents for a given hour, which they are using a simple average of the hours. What if some people do not work the full 8 hours (some do get hurt-right)? Is it then a IRV?

Yes the results are the same as what we would expect as long as we suspend logic and use an expected value that can not be a real estimate of the true population estimate. Thus the study may be useful for in sample testing but for out of sample it becomes useless as someone in the field also noted at A Rebuttal to the Key Problem Answer.

And lastly another person used this question in a test and the answer stated was:
You cannot compute a chi-square test without knowing the number
of nonoccurrences for this example. Midterm II Answer Key

I wish the instructor had explained this more fully because I read this as he rejects the idea of anything not being 21 as not being a valid nonoccurrence event.

This post seems longer than expected so next up will be how to interpret the data under a more logical approach.


Links:
Cognitive Science 14 Practice Final Answers
Midterm II Answer Key-PDF
Chi Square/Wiki
A Little Primer on Multicollinearity

0 Comments:

Post a Comment

<< Home