I was interested to know what factors may help people predict finish time for Boston Marathon, so that it can help people to predict finish time for 2014 Boston marathon, which was stopped by bomb attack. To find out the factors, the project can be divided into 3 parts: (1) I used exploratory data analysis to find out the relationships between different variables. (2) I used cluster analysis and information from previous part to construct some indices that may contribute to the final finish time. (3) I used multivariate regression to verify whether these factors do make a difference. Out of 50 scores for this project, I got 49 from my professor, and ranked among the top 1% in my class.
Data Source: 5000 randomly selected runners who completed the course in 2010 Boston Marathon
Code: click here.
Report: download here.