Last semester a cover story in Nature detailed the Dark Reactions Project (DRP), which demonstrated how to mine failed chemistry experiments to predict materials reactions using machine learning algorithms. That paper synthesized the work of two chemistry professors (Alex Norquist and Josh Schrier) and a computer science professor (Sorelle Friedler) and proved both the value of wider dissemination of unsuccessful syntheses and the possibility of using machine learning to arrive at potential synthetic compounds faster than traditional means.
During his first year at Haverford, Geoffrey Martin-Noble ’16 contributed to that project by doing some data entry to transfer reaction records from lab notebooks to the new DRP database.
“The data entry was boring, but I got to go Alex’s group meetings which made me feel like the coolest frosh in town,” he says.
Later, for his senior thesis, “Optimizing a Machine Learning System for Materials Discovery,” the computer science and chemistry major returned to the project to further the work started in the Nature paper. He helped Haverford Postdoctoral Research Fellow in Cheminformatics Philip Adler (another of the Nature paper’s co-authors) build the DRP computer database into a more robust system, developing the model-building pipeline and database.
“As a double major, I wanted to write one thesis, so I could commit myself to a single project, and the DRP presented a perfect interdisciplinary project,” says Martin-Noble, who graduated summa cum laude and earned the Lyman Beecher Hall Prize in Chemistry in recognition of, among other things, his outstanding research. “It was also in need of someone with one foot in chemistry and one in computer science, and there I was.”
Martin-Noble is returning to his hometown of Seattle this fall to begin work as a software engineer at Google, where he hopes to continue his work on machine learning. But his thesis not only prepared him for his future career, it also marked a life-changing part of his education.
“Research has been the most fulfilling and transformative part of my academic experience at Haverford,” he says. “I have reveled in the opportunity to work with brilliant and engaged faculty, tackling real problems of import.”
What did you learn working on your thesis?
Because the DRP needed to be a system that chemists could use, whereas many computer science theses are just about the conceptual framework, I ended up doing a lot of software engineering. Luckily, I love software engineering! This was my first time working on a real software-engineering project that had actual users (the Norquist lab), a project lead (Phil), a large number of collaborators, security concerns (we regularly fend off hacking attempts), an audience (the open-source repository has gotten about 300 unique visits), coding style standards, and real implications (a paper about the DRP prior to my thesis work recently appeared on the cover of Nature). I also learned a ton about machine learning, which I’m excited to put to use everywhere. I see applications for it everywhere now; it’s difficult for me to look at a problem and not start considering how to go about building a machine learning model.
What are the implications for your thesis research?
The DRP allows more informed exploratory syntheses, accelerating the pace of novel materials discovery. At a more fundamental level, it serves as a prototype application for other areas of materials synthesis and illuminates the best practices for integrating computation and data science into the chemistry laboratory. I was able to significantly improve the DRP models by more thoroughly examining the different ways to characterize a chemical reaction and adjusting the training procedure to more accurately reflect our use case.
Photo by Patrick Montero
“What They Learned” is a blog series exploring the thesis work of recent graduates.