Sunday, December 16, 2018

'Statistics Coursework\r'

'initiatory speculation †For my basic assumption I pull up s nominates investigate the kind in the midst of the derive of TV hours watched per workhebdomad by the bookmans over against their IQ. I am liberation to do the columns â€Å"IQ” and â€Å" just subr knocked let out(p)ine of hours TV watched per week” taken from the Mayfield high needive informationsheet. I figure that in that location leave alone be a kin surrounded by them and go forth attempt to descend upon it.\r\nsecond Hypothesis †For my second shot I volition investigate the family amid â€Å" ordinary good turn of TV hours watched per week” and â€Å"weight (kg)”. I weigh that at that place totallyow for not be every major family between as they go away not change each former(a) greatly.\r\nI go out present my digest and the upshots in interprets and tables and explicate the progenys handling the correlativity of the represents and arrangements of the figures.\r\nI provide postulate a good turn of pupils to base my selective information on and leave alone employ random guard to visit the correct deed of virile and young-bearing(prenominal) pupils get hold of to wanton away the investigation fair.\r\nStratified consume\r\nI do not want to use exclusively of the entropy in the informationbase for my analysis so I allow for need to take a model of the chassis of people in the groom. I would a handle(p) to take virtually 10% of the general figure. I give also need to use separate sampling to make it an equal residuum of the welcome along of mascu course of studys and womanlys in the school to make it fair.\r\nThe supply quash of pupils at the school is 813 so I entrust need to take 10% as my play, 81.3 is round protrude to 81.\r\nThe overall ratio for boys and girls in the school is: 414:399\r\n in a flash I give need to do my sampling\r\nMales = 414 multiplied by 81 = 41\r\ n813\r\nFe antherals = 399 multiplied by 81 = 40\r\n813\r\n ergodic Sampling\r\nNow I feel the fall of samples I will need to select the samples I will be taking. To do this I will use random sampling. I will take random samples until I waste 81. I usher out do this on Excel using the succeeding(a) formula: = round(round()*120.\r\nOnce I perk up self-collected the samples I am ready to latch on analyzing my samples.\r\n summary\r\nHypothesis 1 Males\r\nThe freshman intimacy I need to do in my analysis is to break up my interprets which be the source of the investigation. I hand composed bed covering charts to base the alliance if the devil selective information sources for my kickoff assumption. I perplex separated them into male person and female interprets as in that respect is a insulation in the somas.\r\nFirst male fool represent:\r\nThis premiere interpret presented a bit of a problem. at that place was an erroneous dissolving agent that a ffected the switch off pains and the case of the chartical record. I decided to create a new represent that didnt include that 1 humankind of selective information. This way it would stand by me to fail the hiatus of the information.\r\n irregular male interrupt representical record:\r\nThis interpret giveed the information oft clearer and I could hence convey analyzing it. on that bakshish is no correlativity between the 2 sets of data. This office that it is unlikely that thither is a relationship between IQ and come list of TV hours watched per week. In this it may be that my guesswork is unseasonable. in that location is altogether a very slight side on the prune gentle wind that leans towards a electro blackball coefficient of correlation, but the side is not conscienceless profuse to bring in whatsoever conclusions nigh the relationship between the deuce sets of data. I will experience to use the acacac additive oftenness graphs and disasterfulp sights to identify if every conclusions shadow be made.\r\n additive absolute frequence graphs for IQ and Average digit of TV hours watched per week:\r\nFrom these graphs I could create casewood eyepatchs and analyze the ii sets of data. Before that I analyzed the cumulative absolute relative frequence graphs to seize on initial conclusions. The majority of the IQs for males are between 90 †105, this betokens that the data is sooner diffuse out as this section entirely covers a itsy-bitsy area of the graph. For the TV hours graph, again the data is dot among 1 chief(prenominal) area; in this model it is between 5-25. at that place is al virtually a bang-upaway concern near the pass by of the graph; this designates that in that location is likely to be just about unreasonable results and 0 pupils in between that result and the main bulk. Now I will create box p dress circles so I scum bag equalise the ii graphs together.\r\n ca se plots for cumulative frequency graphs of IQ and second-rate number of TV hours watched per week: (for interquartile ranges take care at copies of graphs at the back)\r\nFrom the box plots I give notice construe that the data pass on is relatively the same away from a possible chimerical result in the TV hours data. This parity is the reason why the cattle ranch graph had no correlation and therefore no relationship. This federal agency that my surmise is wrong.\r\nHypothesis 1 Females\r\nonce more I will start with the crash graphs. As with the male graph I had an preposterous result that outflank out the data and scale down the graph so most of the relevant data couldnt be analyzed. I then did another graph without that ad hoc piece of data.\r\nScatter Graphs 1 and 2 to fate the relationship between IQ and clean number of TV hours watched per week for Females:\r\nAs you terminate chequer on both the graphs there is no correlation between the devil sets of dat a. This again delegacy that my source hypothesis is unlikely to be correct. There is further a slight slope on the tr supplant descent which is not steep generous to eviscerate every conclusions from it. There is another preposterous result on the graph but it doesnt affect the prune line and my conclusions so I remaining it on the graph. I will sequentaway case cumulative frequency graphs to regard if they can help me to draw conclusions.\r\nCumulative frequency graphs for the IQ and number of TV hours watched per week:\r\nI will flat analyze the graphs beforehand drawing box plots to examine the graphs. The IQs graph is some(prenominal) more erratic which message that the data is spread over a tumescent range. Although there is 1 area where the data is turn and the gradient very steep, between 95-105. The TV hours graph is much smoother and the data less spread. The data number of hours increases steadily to a certain point then it goes flat until the end. Th is authority that there is a n ill-judged result somewhere. I manage that it can only be 1 or 2 preposterous because the point where it goes flat is at about 38 and there are only 39 sets of data in the graph. I will now weigh at the box plots to compare the two cumulative frequency graphs.\r\nBox plots for cumulative frequency graphs of IQ and number of TV hours watched for females:\r\nThe box plots for these graphs take the stand me that the IQ data has a much larger range and that it is quite every bit spread. I can see this because the interquartile range is quite large and the median evenly spread. There may be a hardly a(prenominal) exceptions as 1 pupil is likey to have a very low IQ which is why the lowest value is so low. The TV hours data seems to be much more voiceless and the data is generally lower. This shows that there cant be any relationship between them as they each class in certain areas. Also the box plot for TV hours shows that there is likely to bge a n false result as the highest value is so faraway out of the hurrying quartile.\r\nHypothesis 2 Males\r\nIn this hypothesis I will be comparing the Average number of TV hours watched per week and heaviness, to see if there is any relationship between them. I will again start with Males and the Scatter graphs.\r\nScatter graphs 1 and 2 to show the relationship between metric weight unit and the Average number of TV hours watched per week for males:\r\nIn these dispel graphs there is a slight ostracize correlation. This means that as the number of TV hours goes up weightiness goes down. This may not be an holy graph as there are a a few(prenominal) nonsensical results that may have ca utilise the sheer line to be that gradient. If this is so my hypothesis would have been correct, if it is not the gradient of the trend line isnt steep enough to say that it is 100% certain that it is accurate. I will need to use the cumulative frequency graphs to draw complete conclusions.\r \nCumulative frequency graphs for the number of TV hours watched and Weights of males:\r\nThese two graphs look quite diametrical; the weights graph has most of its data concentrated in the snapper of the range, between 30-50 and looks like a general cumulative frequency curve. Whereas the number of TV hours has most of its data concentrated at the beginning between 0-30, video display that there is likely to be an anomalous result at the end of the range. These anomalous results on the TV hours graph are what caused the slight blackball correlation on the trend line. I will be able to make complete conclusions later expression at the female sample and perceive if that graph follows suit. The box plots for these graphs will look quite polar and will make it easy to make a simple comparison.\r\nBox plots for Cumulative frequency graphs IQ and Weight for males:\r\nFrom the box plots I can see that the two sets of data are intimately equivalent in range which would cause a straight line on the fragmentise graph it is because of the anomalous results on the TV hours which caused the slight negative correlation. The weights box plot shows me that the data is quite evenly spread in the middle of the range unconnected from a very heavy person at the end which is why the highest figure is so far by from the upper quartile. Overall the box plots show me that the similarity in the data means there is no relationship and hypothesis was correct.\r\nHypothesis 2 Females\r\n once again I will start with the scatter graphs to show the relationship between turn of TV hours watched and weight. The graphs should be similar to the males and the conclusions the same. Again I had an anomalous result and had to create a second scatter graph without it there.\r\nScatter graphs 1 and 2 to show the relationship between the Number of TV hours watched per week and Weight:\r\nThe second scatter graph in this section, without the anomalous result completely changed the tren d line. The basic graph looks a lot more like the male graph whereas the second follows my hypothesis a lot better. In graph 1 there is a slight gradient on the graph which points towards a negative correlation, like those of the male sample. On the graph without the anomalous result there is understandably no correlation whatsoever as the line is more or less horizontal. I will take the results of the male sample to be wrong as I express earlier there are a few anomalous results which caused the trend line to be at that gradient. Now I will look at the cumulative frequency graphs to see what results I get from them.\r\nCumulative frequency graphs for Average number of TV hours watched per week and Weight for Females:\r\nAs on the males graph the TV hours for females have a lot of anomalous results. But for the scatter graphs I sour them all out which gave no correlation. If the line at the top of the TV hours graph is blanked out the two graphs look almost identical. This is wh y the scatter graph got a near horizontal trend line. The box plots for these to graphs will look alike apart from there will be a much eternal line at the end of the TV hours graph because of the anomalous results.\r\nBox plots of cumulative frequency graphs for Number of TV hours watched and weights of females:\r\nThese box plots show me the same as the males did, that the data is almost identical if placed 1 on top of the other. This is what caused the horizontal line in my scatter graphs and proves my hypothesis.\r\nConclusion\r\nHypothesis 1: My first hypothesis has been turn out erroneous. The scatter graphs show that there is no correlation between the two sets of data. For my hypothesis to have been correct there would have unavoidable to be a strong positive correlation. The cumulative frequency graphs and box plots again proven my hypothesis incorrect, the similarities in the two sets of datas box plots showed that there was no relationship and showed why the scatter g raphs showed a straight line. both the male and female samples showed that my hypothesis was incorrect although some anomalous results created a slight negative correlation in both it was obvious that it was subdued wrong.\r\nHypothesis 2: My second hypothesis was proved correct. The scatter graphs showed that there was absolutely no correlation on the graphs which means no relationship. Although the male graphs did show a a negative correlation it was proved to be made by a few anomalous results by the cumulative frequency and later the revulsion with the female sample. The female scatter graph showed a near horizontal trend line which was what I inevitable to prove my hypothesis. The similarities on the cumulative frequency graphs and box plots further proved my hypothesis was correct.\r\n military rank\r\nThe investigation went quite well although my first hypothjesis was incorrect it showed that careful analysis of data is needed before drawing conclusions. When I next do an investigation into data I will use histograms to financial aid me in my analysis as they come in useful when looking for relationships in two sets of data as the cumulative frequency graphs do. I could have made the cumulative frequency graphs a smaller better as the program I used did not put a scale on the x axis but only the duration of the range.\r\nStatistics Coursework\r\n1st Hypothesis †For my first hypothesis I will investigate the relationship between the number of TV hours watched per week by the pupils against their IQ. I am going to use the columns â€Å"IQ” and â€Å"Average number of hours TV watched per week” taken from the Mayfield high datasheet. I think that there will be a relationship between them and will attempt to reveal it.\r\n2nd Hypothesis †For my second hypothesis I will investigate the relationship between â€Å"Average number of TV hours watched per week” and â€Å"weight (kg)”. I think that there will not be any m ajor relationship between as they will not affect each other greatly.\r\nI will present my analysis and the results in graphs and tables and explain the results using the correlation of the graphs and arrangements of the figures.\r\nI will select a number of pupils to base my data on and will use random sampling to ascertain the correct number of male and female pupils needed to make the investigation fair.\r\nStratified Sampling\r\nI do not want to use all of the data in the database for my analysis so I will need to take a sample of the number of people in the school. I would like to take about 10% of the overall figure. I will also need to use stratified sampling to make it an equal proportion of the number of males and females in the school to make it fair.\r\nThe total number of pupils at the school is 813 so I will need to take 10% as my number, 81.3 is rounded down to 81.\r\nThe overall ratio for boys and girls in the school is: 414:399\r\nNow I will need to do my sampling\r\ nMales = 414 multiplied by 81 = 41\r\n813\r\nFemales = 399 multiplied by 81 = 40\r\n813\r\nRandom Sampling\r\nNow I have the number of samples I will need to select the samples I will be taking. To do this I will use random sampling. I will take random samples until I have 81. I can do this on Excel using the following formula: = round(round()*120.\r\nOnce I have gathered the samples I am ready to start analyzing my samples.\r\nAnalysis\r\nHypothesis 1 Males\r\nThe first thing I need to do in my analysis is to analyze my graphs which are the source of the investigation. I have created scatter graphs to show the relationship if the two data sources for my first hypothesis. I have separated them into male and female graphs as there is a separation in the numbers.\r\nFirst male scatter graph:\r\nThis first graph presented a bit of a problem. There was an anomalous result that affected the trend line and the scale of the graph. I decided to create a new graph that didnt include that 1 p iece of data. This way it would help me to analyze the rest of the data.\r\nSecond male scatter graph:\r\nThis graph showed the data much clearer and I could then start analyzing it. There is no correlation between the 2 sets of data. This means that it is unlikely that there is a relationship between IQ and Average number of TV hours watched per week. In this it may be that my hypothesis is incorrect. There is only a very slight gradient on the trendline that leans towards a negative correlation, but the gradient is not steep enough to draw any conclusions about the relationship between the two sets of data. I will have to use the cumulative frequency graphs and boxplots to see if any conclusions can be made.\r\nCumulative frequency graphs for IQ and Average number of TV hours watched per week:\r\nFrom these graphs I could create box plots and compare the two sets of data. Before that I analyzed the cumulative frequency graphs to draw initial conclusions. The majority of the IQs fo r males are between 90 †105, this shows that the data is quite spread out as this section only covers a small area of the graph. For the TV hours graph, again the data is spread among 1 main area; in this case it is between 5-25. There is almost a straight line near the top of the graph; this shows that there is likely to be some anomalous results and 0 pupils in between that result and the main bulk. Now I will create box plots so I can compare the two graphs together.\r\nBox plots for cumulative frequency graphs of IQ and average number of TV hours watched per week: (for interquartile ranges look at copies of graphs at the back)\r\nFrom the box plots I can see that the data spread is relatively the same apart from a possible anomalous result in the TV hours data. This similarity is the reason why the scatter graph had no correlation and therefore no relationship. This means that my hypothesis is wrong.\r\nHypothesis 1 Females\r\nAgain I will start with the scatter graphs. As with the male graph I had an anomalous result that spread out the data and scale down the graph so most of the relevant data couldnt be analyzed. I then did another graph without that specific piece of data.\r\nScatter Graphs 1 and 2 to show the relationship between IQ and average number of TV hours watched per week for Females:\r\nAs you can see on both the graphs there is no correlation between the two sets of data. This again means that my first hypothesis is unlikely to be correct. There is only a slight gradient on the trend line which is not steep enough to draw any conclusions from it. There is another anomalous result on the graph but it doesnt affect the trend line and my conclusions so I left it on the graph. I will now crate cumulative frequency graphs to see if they can help me to draw conclusions.\r\nCumulative frequency graphs for the IQ and number of TV hours watched per week:\r\nI will now analyze the graphs before drawing box plots to compare the graphs. The IQs gra ph is much more erratic which means that the data is spread over a larger range. Although there is 1 area where the data is concentrated and the gradient very steep, between 95-105. The TV hours graph is much smoother and the data less spread. The data number of hours increases steadily to a certain point then it goes flat until the end. This means that there is a n anomalous result somewhere. I know that it can only be 1 or 2 anomalous because the point where it goes flat is at about 38 and there are only 39 sets of data in the graph. I will now look at the box plots to compare the two cumulative frequency graphs.\r\nBox plots for cumulative frequency graphs of IQ and number of TV hours watched for females:\r\nThe box plots for these graphs show me that the IQ data has a much larger range and that it is quite evenly spread. I can see this because the interquartile range is quite large and the median evenly spread. There may be a few exceptions as 1 pupil is likey to have a very low IQ which is why the lowest value is so low. The TV hours data seems to be much more concentrated and the data is generally lower. This shows that there cant be any relationship between them as they each grouped in certain areas. Also the box plot for TV hours shows that there is likely to bge an anomalous result as the highest value is so far out of the upper quartile.\r\nHypothesis 2 Males\r\nIn this hypothesis I will be comparing the Average number of TV hours watched per week and Weight, to see if there is any relationship between them. I will again start with Males and the Scatter graphs.\r\nScatter graphs 1 and 2 to show the relationship between Weight and the Average number of TV hours watched per week for males:\r\nIn these scatter graphs there is a slight negative correlation. This means that as the number of TV hours goes up Weight goes down. This may not be an accurate graph as there are a few anomalous results that may have caused the trend line to be that gradient. If t his is so my hypothesis would have been correct, if it is not the gradient of the trend line isnt steep enough to say that it is 100% certain that it is accurate. I will need to use the cumulative frequency graphs to draw complete conclusions.\r\nCumulative frequency graphs for the number of TV hours watched and Weights of males:\r\nThese two graphs look quite different; the weights graph has most of its data concentrated in the middle of the range, between 30-50 and looks like a normal cumulative frequency curve. Whereas the number of TV hours has most of its data concentrated at the beginning between 0-30, showing that there is likely to be an anomalous result at the end of the range. These anomalous results on the TV hours graph are what caused the slight negative correlation on the trend line. I will be able to make complete conclusions after looking at the female sample and seeing if that graph follows suit. The box plots for these graphs will look quite different and will make it easy to make a simple comparison.\r\nBox plots for Cumulative frequency graphs IQ and Weight for males:\r\nFrom the box plots I can see that the two sets of data are almost identical in range which would cause a straight line on the scatter graph it is because of the anomalous results on the TV hours which caused the slight negative correlation. The weights box plot shows me that the data is quite evenly spread in the middle of the range apart from a very heavy person at the end which is why the highest figure is so far apart from the upper quartile. Overall the box plots show me that the similarity in the data means there is no relationship and hypothesis was correct.\r\nHypothesis 2 Females\r\nAgain I will start with the scatter graphs to show the relationship between Number of TV hours watched and weight. The graphs should be similar to the males and the conclusions the same. Again I had an anomalous result and had to create a second scatter graph without it there.\r\nScatter graphs 1 and 2 to show the relationship between the Number of TV hours watched per week and Weight:\r\nThe second scatter graph in this section, without the anomalous result completely changed the trend line. The first graph looks a lot more like the male graph whereas the second follows my hypothesis a lot better. In graph 1 there is a slight gradient on the graph which points towards a negative correlation, like those of the male sample. On the graph without the anomalous result there is clearly no correlation whatsoever as the line is nearly horizontal. I will take the results of the male sample to be wrong as I said earlier there are a few anomalous results which caused the trend line to be at that gradient. Now I will look at the cumulative frequency graphs to see what results I get from them.\r\nCumulative frequency graphs for Average number of TV hours watched per week and Weight for Females:\r\nAs on the males graph the TV hours for females have a lot of anomalous results. But for the scatter graphs I cancelled them all out which gave no correlation. If the line at the top of the TV hours graph is blanked out the two graphs look almost identical. This is why the scatter graph got a near horizontal trend line. The box plots for these to graphs will look alike apart from there will be a much longer line at the end of the TV hours graph because of the anomalous results.\r\nBox plots of cumulative frequency graphs for Number of TV hours watched and weights of females:\r\nThese box plots show me the same as the males did, that the data is almost identical if placed 1 on top of the other. This is what caused the horizontal line in my scatter graphs and proves my hypothesis.\r\nConclusion\r\nHypothesis 1: My first hypothesis has been proved incorrect. The scatter graphs show that there is no correlation between the two sets of data. For my hypothesis to have been correct there would have needed to be a strong positive correlation. The cumulative frequency gr aphs and box plots again proved my hypothesis incorrect, the similarities in the two sets of datas box plots showed that there was no relationship and showed why the scatter graphs showed a straight line. Both the male and female samples showed that my hypothesis was incorrect although some anomalous results created a slight negative correlation in both it was obvious that it was still wrong.\r\nHypothesis 2: My second hypothesis was proved correct. The scatter graphs showed that there was absolutely no correlation on the graphs which means no relationship. Although the male graphs did show a a negative correlation it was proved to be made by a few anomalous results by the cumulative frequency and later the inconsistency with the female sample. The female scatter graph showed a near horizontal trend line which was what I needed to prove my hypothesis. The similarities on the cumulative frequency graphs and box plots further proved my hypothesis was correct.\r\nEvaluation\r\nThe inve stigation went quite well although my first hypothjesis was incorrect it showed that careful analysis of data is needed before drawing conclusions. When I next do an investigation into data I will use histograms to aid me in my analysis as they come in useful when looking for relationships in two sets of data as the cumulative frequency graphs do. I could have made the cumulative frequency graphs a little better as the program I used did not put a scale on the x axis but only the length of the range.\r\n'

No comments:

Post a Comment