Crash Course on Machine Learning for Molecular Design

Course description

The design of new molecules has countless applications in various industrial sectors, including pharmaceuticals and materials. However, identifying molecules with the desired properties is a complex task, as it involves identifying specific elements within the vast and structurally complex chemical space.

Mathematically, this task can be likened to a combinatorial optimization problem, often stochastic and multi-objective, with black-box objective functions and constraints. To find approximate solutions to this problem, two interrelated steps are necessary:

Developing predictive models that can forecast the properties of interest from the chemical structure of the molecules.
Creating algorithms for the automatic generation of molecules (de-novo generation) that meet specific structural constraints and optimize the predicted properties from the first stage.

In this course, we will explore various machine learning strategies that can be utilized to effectively navigate the chemical space.

Course contents

Session 1 - ML for Molecular Properties Prediction

During the first session, we will delve into the process of fitting predictive models that can forecast the properties of molecules based on their structure. We will give special consideration to the challenge posed by the small data regime, which is a crucial obstacle in this field.

The session will be structured as follows:

1. Computational representations of molecules

Features-based representations
String-based representations
Graph-based representations
3D representations

2. An overview of predictive models for molecular properties

Probabilistic vs Deterministic Machine Learning
Basic models using features-based molecular representations
Basic models using string-based molecular representations
Basic models using graph-based molecular representations
Basic models using 3D molecular representations

3. Evaluating model performance

Basics
Evaluating quality of probabilistic predictions

Session 2 - De-novo molecular desing

In this second session, using the concepts presented previously, we will introduce new models that perform de-novo molecular design. We will employ pre-existing data to design novel molecules that are different from those present in our database. In most cases, this design phase aims at designing molecules which optimize a desired target properties in the attempt to produce new compounds. We will use the different molecular representations throughout the session wherever necessary.

The session will be structured as follows:

1. Generative models for molecules

Conventional drug-design process
Review of the models from the previous session
Discriminative models vs. generative models
Unconstrained vs. targeted molecule generation

2. Overview on generative molecule models

Gradient-free vs. gradient-based models
Atom, fragment and reaction-based approaches
Basic gradient-free models
Basic gradient-based models
Latest generative models

Slides

Notebooks

Session 1 Notebooks
- Descriptors Based Predictive Models
- Graph Based Predictive Models
Session 2 Notebooks
Case Study Notebooks