Predictive Modeling of the Outcome of Cancer Patients

Dan Day
Departments of Computer Science and Biology
Augustana College

Being able to predict the likely outcome of a cancer patient's survival is one of great interest to many people in the world. With the recent strides to use computers with biological data, well-established techniques of data mining can be applied to these problems with potentially strong success in producing a useful predictive model. In this experiment, we have a set of cancer patients and certain SNPs (single nucleotide polymorphisms) in their DNA and their eventually survival. Using this information in a cutting-edge data mining algorithm called random forests, we can built a model to predict future patients. Moreover, this algorithm allows us to identify which SNPs are significant, leading to future research in identifying what genetic components are involved in this type of cancer.