Brief Introduction:

We wanted to use wavelength information of substances to predict whether they are benign or harmful. Since each substance has 13,701 wavelengths, but we only have 144 (36 harmful)substances in the training set, I first used Principal Component Analysis to cut down the dimensions to 3 component principals as predictors. Then I used multivariate regression to select out 2 elements as additional predictors. Finally I used Support Vector Machine to construct a prediction model and test it on the testing set. The error rate was 2%. For this project, I ranked top 1% of the class and got Honor scale at the end of the semester.

Data Source: Dr. Emily Snyder at the US Environmental Protection Agency, Oce of Research and Development, National Homeland Security Research Center, Research Triangle Park, NC27711.

Code: click here.

Report: download here.