Description as a Tweet:

Chemotherapy for all has long proved to be ineffective. While genomic sequencing might be the new Chemotherapy, how much better could it get with Artificial Intelligence and Machine Learning?


Computer Science for saving human lives? Yes, this is what is cooking these days. Computer scientists are helping medical research scientists to analyze data, devise software programs and diagnostic tools to understand diseases in a better way and thus save human lives. In coming times, Computer Scientists are going to play major roles in diseases like Cancer where oncologists will be supported by a lot of data crunching methods to understand the disease progression in a better way.

We as human beings were dying of cancer 100 years back and though today survival rate might have increased with research and medicine but “We still are dying of Cancer”. It is the second leading cause of death globally, accounting for an estimated 9.6 million deaths, more than half of which were after receiving cancer treatment.

The demise of my grandfather even after giving him the best line of treatment available for Carcinoma Duodenum acted as a trigger for me to look into the intersection of Computer Science and Medicine.
After research it was found that two of the numerous reasons for cancer deaths could be solved only by computer scientists. First one, being the excess of research data present. Oncologists have to physically sort through millions of research papers to find the genomic research that might apply to an individual disease diagnosis and for finding a possible line of treatment. There are projects such as “Project Hanover” and “Literome” that are developing cloud computing-based system that narrows down research results based on which oncologists can suggest individualized treatment for the patient rather than the generalized chemotherapy that all patients receive. Secondly, a group of computer scientists is working to create powerful algorithms that will help oncologists understand how cancer developed and also predict the future course the cell might take using digital simulations.

Just think: what if we can overpower deadly diseases with technology and what if we can solve a difficult puzzle about diseases like Cancer where cell behaviour is so unpredictable that even learned oncologists cannot decide the true course of action of the disease.

Cancer biology and clinical care are undergoing transformational advances whereby research and discovery insights are powered by the quantitative sciences and modern computing. Dramatic progress in computational methods, software development practices, cloud computing and profoundly rich cancer datasets are combining to define a new field of science – computational medicine- which is exactly where I am heading.

What it does:

Gene Classification:
Once sequenced, a cancer tumor can have thousands of genetic mutations. But the challenge is distinguishing the mutations that contribute to tumor growth (drivers) from the neutral mutations (passengers).

Currently this interpretation of genetic mutations is being done manually. This is a very time-consuming task where a clinical pathologist has to manually review and classify every single genetic mutation based on evidence from text-based clinical literature.

We used the dataset provided by Memorial Sloan Kettering Cancer Center in a Kaggle open-source competition.

We started to develop a Machine Learning algorithm, in particular using the linear Support Vector Machine model that, using this knowledge base as a baseline, automatically classifies genetic variations. As we anticipated, this wasn't a project we could finish in 24 hours. But we are very proud of the progress we made in <24 hours. In the original competition, participants were provided 4 months to complete the model.

How we built it:

We used the dataset provided by Memorial Sloan Kettering Cancer Center in a Kaggle open-source competition.

Technologies we used:

  • Python
  • AI/Machine Learning

Challenges we ran into:

Choice of model: so many of them present- SVM, K-neighbors, Logistic regression, Naive Bayes, Stacking classifier
- we are just sophomores and have never taken an ML class before. We spent a huge amount of time understanding splitting data, one hot encoding, abstract behind support vector machine, etc

Accomplishments we're proud of:

The fact that we got so far. For a project that took experts 4 months to build fully, in 24 hours we were able to complete Pre-Processing, Load all 4 datasets, and split the dataset into training, testing and cross-validation.

What we've learned:

Literally exponential growth in 24 hours! I am so glad I participated in HackHer 413. It was the first official and MEANINGFUL CS project for both of us. It got us started, gave clear direction and pushed us to keep continuing this project even after the competition ends.

What's next:

A few things:
1) Complete the model. It should be able to classify gene and variations into correct classes. (Some classes out of 1-9 are cancerous and others aren't, hence the importance of correct classification)
2) This will probably take us months but find ways to improve efficiency of the model. Currently all the models we saw in Kaggle competition submissions had 35-40% misclassification.
3) And then finally, take this a step forward, I envision an API where oncologists enter the genomic sequence of their incoming patient, the ML model classifies the gene and variations, hunts through a database for previous cancer patients throughout the world with similar variations and finally presents the oncologist with results. Results include the parameters of previous patients (Hb, symptoms, stage), the treatment followed (chemo, radiation, specific drug, trials), and their results (improvement, death, how many months lived).

This allows the oncologist to make an informed decision based on previous knowledge. This can be made possible only by the power of CS (AI/ML) and Medicine combined.

Built with:

Python, Kaggle dataset, IBM Data Science Certification, help from towardsdatascience, medium and other helpful websites.

Prizes we're going for:

  • Best Pitch/Demo
  • Best Use of AI Model
  • Best Sustainability Hack
  • Mass State Lottery New Game Challenge
  • Best Software Hack
  • Cutest Hack
  • Best Hack for a Healthier Happier World
  • Best User Experience / Product Design Hack
  • Best Use of Data Hack!
  • Best Hardware Havk

Team Members

Alexander Ruse
Suhani Chawla

Table Number

Table 17