Sunday, October 15, 2006

Accidents at Irongate

Now to review from the last post about the Chi Square Lesson. Question:
The Irongate Foundry, Ltd., has kept records of on-the-job accidents for many years. Accidents are reported according to which hour of an 8-hour shift they happen. The following table shows their accident report.

Data:

The union at the foundry wants to know whether accidents are more likely to take place during one hour of the shift rather than another. They are asking you what you think.

Do you think that more accidents are likely to take place during one hour of a shift over another?

But why is the union concerned about whether one hour is more accident prone than another? A better question is to ask whether fatigue has an influence on accidents as the shift goes along. Thus we should see an increase in accidents as the hours pass by.
For the Microfit output for the Ordinary Least Squares Estimation (results), I used the following Equation:
NUMACC=B(1)+B(2)Shifthour+u(i)
And thus the Equation with the regression coefficients:
NUMACC=15.8571+1.1429Shifthour+u(i)
Like most regression equations the intercept coefficient has little relevance in this analysis. This would strictly mean that before work began during the zero hour that there would nearly 16 accidents. But the slope coefficient is predictive that for every hour the shift drags on that nearly 1.5 more accidents would occur.

And now to test how good this model is...
1. First let me test that the null hypothesis of the slope coefficient is equal to 0 (H0=0) and thus the alternative hypothesis is not equal to 0. If we test it at the 5% critical value (using the P statistics), then we reject the null hypothesis that the slope coefficient is equal to 0 which .05>[.047].
2. R^2=.50794 which signifies that over 50% of the variation in Numacc is attributed to which hour of the shift. And since there is only one slope coefficient then the same null hypothesis of that R^2 is not zero is the same at the .05 critical value [.047] of the slope coefficient.
And now to test whether our model passes the variety of diagnostic tests:
3. The Durbin Watson d statistic is stated as 3.0376. The dL=0.497 and dU=1.003 with n=8 and k=1. Since the d statistic is on the high side we need to figure the 4-dU=2.997 and 4-dL=3.503 and this means that it is in the zone of indecision as to whether there is evidence of negative correlation.
4. For the null hypothesis of no autocorrelation, we use the Diagnostic Test A and we do not reject the null hypothesis at the .05 level of significance (.05<.116 or .195). So even though the d statistic was indecisive this test resulted in no problem with autocorrelation.
5. For the Ramsey Reset Test/Diagnostic Test B, the null hypothesis is that the model is correctly specified which we do not reject at the 5% level of significance. Using the p statistics, we have .05<[.905] or [.928] by a wide range.
6. We now test for normality of the disturbance terms using diagnostic test C (Jarque-Bera test) with the null hypothesis that the population disturbance term is normally distributed. And here we sould not reject the null hypotheses at the 5% level of significance (.05<[.801]).
7. And lastly we test for heteroscedasticity with diagnostic test D. The null hypothesis of homoscedasticity cannot be rejected on the basis of the test at the .05 level of significance (.05<[.470] or [.541]).
So in conclusion we have an indecisive test for the Durbin-Watson d test for negative autocorrelation. But we can conclude at the 5% level of significance for any problems with first order autocorrelation (AR(1)) and autocorrelation and non normal population disturbances and lastly heteroscedasticity.

But it still would have been better to get all the raw data to do a complete analysis of the Accidents at Irongate.
And that is the way it is done.

Saturday, October 14, 2006

Chi-Square/Be sure to use logic in any statistic application.

With the recent furvor over another Lancet study about the deaths in Iraq, there is some interest in Chi Square as a statistical tool. Since I have not seen the complete report with the raw data and no one will be able to confirm the raw data, we may never be able to fully look into this 'study'. But I was recently shown a link about Chi-Square from above title link.

Just a quick look at whose resume this lesson plan was created by, shows that he graduated with a M. Ed. in progress, to be conferred May 1997 from U. of Illinois, Urbana-Champaign. Major: Curriculum and Instruction.

Most of the lesson seems straight forward and somewhat interesting, but I wanted to address his choice and presentation of "Accidents at Irongate". First let me present his question and data:
The Irongate Foundry, Ltd., has kept records of on-the-job accidents for many years. Accidents are reported according to which hour of an 8-hour shift they happen. The following table shows their accident report.


The union at the foundry wants to know whether accidents are more likely to take place during one hour of the shift rather than another. They are asking you what you think.

Do you think that more accidents are likely to take place during one hour of a shift over another?
If you said yes you would be hard pressed to prove yourself at this point, but continue on and be amazed!

The first question that comes to mind is why use Chi Square (at least on this experiment)? Well the obvious reason not to use Chi Square is that hours worked during the day is not an "independent, normally distributed random variable"(Chi Square). It clearly is not an IRV (Independent Random Variable) that you can not assume one roll (hour) is not influenced by another roll (hour).
Well let us go on.

Does this histogram help in discerning when more accidents tend to happen?

Actually it does. By plotting a line through the majority of the points then yes it looks like a slight upward slope to the right. This would mean that more accidents tend to happen in the later part of a shift than in the earlier part of a shift. And again we have an easy theory that as people become fatigued they tend to have more accidents.

Now of course this question is located in an educational book:
based on Ch 6 of Using Statistics
by Travers, Stout, Swift, and Sextro.

And we see that another class used the question also at Cognitive Science 14 Practice Final Answers. And I believe the question may also be in Howell, D. C. (2002). Statistical Methods for Psychology (5th ed). Duxbury.
And then this brings up the expected accidents for a given hour, which they are using a simple average of the hours. What if some people do not work the full 8 hours (some do get hurt-right)? Is it then a IRV?

Yes the results are the same as what we would expect as long as we suspend logic and use an expected value that can not be a real estimate of the true population estimate. Thus the study may be useful for in sample testing but for out of sample it becomes useless as someone in the field also noted at A Rebuttal to the Key Problem Answer.

And lastly another person used this question in a test and the answer stated was:
You cannot compute a chi-square test without knowing the number
of nonoccurrences for this example. Midterm II Answer Key

I wish the instructor had explained this more fully because I read this as he rejects the idea of anything not being 21 as not being a valid nonoccurrence event.

This post seems longer than expected so next up will be how to interpret the data under a more logical approach.


Links:
Cognitive Science 14 Practice Final Answers
Midterm II Answer Key-PDF
Chi Square/Wiki
A Little Primer on Multicollinearity

Sunday, October 08, 2006

What A Deal!

Dear Senator Sarbanes:

As a native Californian and excellent customer of the Internal Revenue Service, I am writing to ask for your assistance. I have contacted the Department of Homeland Security in an effort to determine the process for becoming an illegal alien and they referred me to you.

My primary reason for wishing to change my status from U.S.Citizen to illegal alien stem from the bill which was recently passed by the Senate and for which you voted.If my understanding of this bill's provisions is accurate, as an illegal alien who has been in the United States for five years, all I need to do to become a citizen is to pay a $2,000 fine and income taxes for three of the last five years. I know a good deal when I see one and I am anxious to get the process started before everyone figures it out.

Simply put, those of us who have been here legally have had to pay taxes every year so I'm excited about the prospect of avoiding two years of taxes in return for paying a $2,000 fine. Is there any way that I can apply to be illegal retroactively? This would yield an excellent result for me and my family because we paid heavy taxes in 2004 and 2005.

Additionally, as an illegal alien I could begin using the local emergency room as my primary health care provider.

Once I have stopped paying premiums for medical insurance,my accountant figures I could save almost $10,000 a year.

Another benefit in gaining illegal status would be that my daughter would receive preferential treatment relative to her law school applications, as well as "in-state" tuition rates for many colleges throughout the United States for my son.

Lastly, I understand that illegal status would relieve me of the burden of renewing my driver's license and making those burdensome car insuranc e premiums. This is very important to me given that I still have college age children driving my car.

If you would provide me with an outline of the process to become illegal (retroactively if possible) and copies of the necessary forms, I would be most appreciative.

Thank you for your assistance.

Your Loyal Constituent,

Gene Baker

I know just a cut and past today, but it was too funny.
It also shows when we create incentives for doing the 'right' thing then we can also inadvertently create incentives to do the 'wrong' thing.