Examining statistics for Student mathmatics
...of the entire population. Sample Population Min Value 1 1 Lower Quartile 9 9 Median 16.5 16 Upper Quartile 24.75 24 Max Value 36 37 Box plots The box plots below are also to compare the sample with the population. We can clearly see from these box plots that the sample is a very fair representation of the population as they are extremely similar, and there is very little variation between the two. Also they both have the same scale, and so, have been created fairly and equally. Hypothesis 1 My first theory is that “more lessons taken mean fewer mistakes” The scatter graph below shows the amount of mistakes made and lessons taken for the sample. The R value is called the Correlation co-efficient, and it allows us to measure the correlation. When R is 0 there is no correlation. If it is at -1 then there is a perfect negative correlation, and if at 1 then there is a perfect positive correlation. This hypothesis has definitely proven true in my sample, and we can clearly see from the positive correlation of the R value (0.28) and the fairly negative trend line correlation that the more lessons that the individual takes, the fewer mistakes they have made. So the theory “More lessons mean fewer mistakes” is true, and I accept hypothesis 1. Unfortunately this hypothesis of “More lessons mean fewer mistakes” is only really relevant to the entire sample and not individual instructors. This statement probably applies more to some instructors than others. So now I will investigate the following: Hypothesis 2 “More lessons mean fewer mistakes, depending on the instructor.” I will make a scatter graph, similar to the previous one, for each of the 4 different instructors, and discuss any visible results from it. Instructor A: Instructor A has some positive correlation, and a high number of outliers. There is a very low positive co-efficient correlation. I will accept hypothesis 1 for instructor A. The outliers could have been caused by traffic, bad weather, the time of day, or a number of other things. This applies to all of the instructors. Instructor B has no real correlation at all and very low correlation co-efficient. Because of this lack of correlation I cannot predict the missing values of instructor B in sample. I am rejecting hypothesis 1 for instructor B. Instructor C has a clear and very strong negative correlation. There is also a very high correlation co-efficient. There are a small number of outliers, and I can accept hypothesis 1 for instructor C. Instructor D has a pretty strong negative correlation, but also has a lot of outliers. Its correlation co-efficient is reasonably high as well. I can accept hypothesis 1 for instructor D. Conclusion for hypothesis 1 and 2: I can accept hypothesis 1 for instructors A, C, and D, as they all show that the more lessons the pupils have taken the fewer mistakes they have made in the test. However, I cannot accept this for B. In addition I will accept hypothesis 2 as the different instructors do have an effect on the amount of mistakes that you make. Instructor B, for example, did not influence how many mistakes that their pupils made very much, were as Instructor C helped their students a lot in their lessons to make less mistakes on the test. On the other hand, just because instructor C and D have quite strong correlation between number of lessons and number of mistakes, does not essentially mean that they are the best instructors, as they may have taken a lot of lessons with their pupils before the pupils actually passed. Also very few of instructor D’s pupils actually passed, and on top of this; although instructor A had less of an effect on his pupils, roughly 2/3 of his pupils passed the test. As hypothesis 1 is not true for instructor B or for the full sample, I will look into something else, and for this, I have chosen genders. Hypothesis 3: “Women are better drivers than men” This scatter graph for mistakes v lessons for females has a slightly positive correlation and a very slight positive ...