Matthew Raymond recognized for research using ML techniques to design new types of medicine

Doctoral student Matthew Raymond wants to facilitate the development of new and groundbreaking nanomedicines.
Matt Raymond

Matthew Raymond received a poster award at the 2024 Midwest Machine Learning Symposium for his research, “Joint optimization significantly improves gradient boosting.” Raymond is a doctoral student in ECE, working with Prof. Clayton Scott and Prof. Angela Violi (Mechanical Engineering).

Raymond’s fundamental research in machine learning is closely tied to the development of new drugs by the medical community. 

“In machine learning,” explained Raymond, “we use mathematical models that learn how to make predictions based on a set of features (i.e. numbers) that describe the task at hand. In drug design, we also want to learn what features of a molecule are most important for its drug-like behavior so we can reverse engineer new drugs. However, sometimes there is not enough data in one dataset to reliably find these features, so we want a model that can improve its performance by simultaneously analyzing multiple datasets (called “tasks”). Current methods select features sharing linear relationships, find suboptimal solutions, or require unreasonable amounts of computing power. In this work, our goal was to develop a new method to select the most important features across multiple datasets without having any of these shortcomings. We call this approach, Joint Optimization of Piecewise Linear Ensembles (JOPLEn).”

Raymond’s work will increase our understanding of nanoparticles, which have a unique chemistry that makes them ideal for drugs like antibiotics and anticancer medicine.

“Researchers have tried reverse-engineering nanoparticles using feature selection,” explained Raymond, “but there is not enough data to draw fundamental insights. However, there is much data on small molecules and proteins (and they are comparatively well-understood), so we would like to use these datasets to improve our ability to reverse engineer nanoparticles. This is where our model, JOPLEn, comes in. It is able to select features from multiple datasets, which stabilizes the feature selection on small datasets and improves our ability to reverse-engineer nanoparticles by allowing us to compare nanoparticles to proteins and small molecules.”

Raymond says that the biggest challenge in this work was determining how much of the mathematical model to retrain, and what constraints should be put on the model. 

“Surprisingly,” said Raymond, “we found that partially-retraining an existing model was easier and faster than developing a new method from the ground up, and even this simple approach dramatically improved performance. Another challenge was that the original optimization program was unreasonably slow.”

He says he combined the optimization techniques taught in ECE 559 (Optimization Methods in Signal Processing and Machine Learning) with GPU acceleration to reduce the runtime by 3–4 orders of magnitude.

“Our main contribution is in the way we constrain the model,” explained Raymond. “By making the model more expressive, it is more likely to accidentally make wildly-inaccurate predictions for some inputs. However, we can force the model to make smooth predictions, which reduces the chances that any individual input will have less accurate predictions than other, similar samples. Additionally, our approach means that new, previously-unused constraints can be used, which opens the door to new applications.”

Further developments in this area of research were presented at the IEEE International Workshop on Machine Learning for Signal Processing in the paper, “Joint Optimization of Piecewise Linear Ensembles.” The code is available on PyPI.

“There are countless groundbreaking nanomedicines just waiting to be discovered,” said Raymond, who is hopeful this research will facilitate the process.