Reaction Modeling

Machine Learning powered computational chemistry methods to predict reaction yield
Context

Time consuming process and difficulty in estimating reaction yields is slowing down drug development

Suzuki Coupling Full Mechanism showing Reductive Elimination, Oxidative Addition, and Transmetalation
Suzuki Coupling Mechanism - An illustration by Organic Chemist - 19, Own work, CC BY-SA 3.0

Yield optimization is one of the key priorities in the drug synthesis process development. Chemical and pharma companies have to perform various reactions in different conditions in a laboratory to find the optimal synthesis path. They further need to find the actual reaction yield (e.g., Suzuki Reaction) as it affects the evaluation of complex reaction paths and the selection of synthesis paths. One step with a low percent yield can lead to a big overall drop in yield.

Key challenges faced by the Chemical and Pharma companies when determining the reaction yield:

  • Longer time to perform the reactions in a laboratory and determine the actual yield
  • Lack of information about the yield of reactions that have not been reported previously in the literature
  • Difficulties in optimizing synthesis paths from commercially available raw materials to form a significant ‘drug-like’ molecule
Our Solution

An AI powered drug discovery solution to predict reaction yields faster, and with higher accuracy

Aganitha’s ‘Reaction Modeling’ solution involves a combination of descriptors (Structural, Energy, ECFP, Reaction-mechanism based etc.) and model architectures (Random Forest, XG Boost, GCNs etc.) that can classify the reactions as high vs. low yield.

The solution features a data model that captures all the reaction information required for machine learning modeling. Further, its DFT computational pipelines speed up the estimation of reaction-mechanism based parameters such as energy, charge, bond-length, etc.

The Solution features:

  • Scalable HPC pipelines for calculation of reaction-mechanism based DFT descriptors
  • NLP techniques to extract information about reactants, products, solvents, catalysts, etc. from public datasets such as USPTO
  • Data processing pipelines to filter reactions of classes e.g., Suzuki reaction coupling reactions
Given a reaction, the reaction modeling pipeline starts with the DFT computation pipeline which involves using density functional descriptors, physicochemical descriptors, and extended connectivity fingerprints. The featurized data is split into a training set and a test set. This is followed by cataloging and documenting the models in a Model Catalog. After this is the Model Sustenance which involves updating the models with new data and evaluating them for accuracy and reliability. A significant parallel process is exploratory data analysis which involves generating correlation heat maps, descriptor distributions, and box plots.
AI/ML based Reaction Modeling
Highlights

Key components & strengths

Reactions Classifier

ML models classify reactions as high vs. low yielding reactions and provide a list of predicted yields of all the reactions

Reactions Explorer

Visualize reaction yield distributions on a given chemical space and correlations of descriptors with reaction yield

Molecule Previewer

Leverages SMILES specification that helps to preview a molecule or a reaction

Model Catalog

Provides results for multiple combinations of datasets, descriptors, and algorithms from various machine learning models

DFT Pipeline

Speeds up the estimation of reaction-mechanism based parameters such as energy, charge, bond-length etc.

Scalable HPC Pipelines

Leverages HPC schedulers such as SLURM and SGE for scaling calculation of reaction mechanism descriptors
Outcomes

Accelerated drug development through faster and accurate reaction yield prediction

Cost savings

By eliminating reactions that will have insignificant reaction yield from being experimented in the wet lab

Quick information retrieval

By codifying the publicly available reaction information and the computed descriptors in the reactions database

Visibility into expected yield

Through reinforcement learning (RL) methods that are better than DOE efficiency and more powerful than using chemical synthesis reactors

Faster and accurate

Leveraging NLP methods to filter reactions of specific classes and extract information about reaction entities, and machine learning to predict impurity and side reactions

Discover our offerings across the biopharma value chain

Learn more about our Reaction Modeling