De-novo Molecular Design

Roi Naveiro (CUNEF)

Simón Rodríguez Santana (ICMAT)

Molecular design problem

Design of new molecules is time and resource intensive task
Generating promising candidates is one of the main bottlenecks
Old approach: Expert propose + synthesize + measure candidates in vitro
Soon-to-be-old way: High throughput virtual screening (HTVS)

Traditional molecular design

Virtual Screening (VS): Brute-force evaluation of huge libraries of compounds to identify structures that improve desired properties (e.g. drug-likeness)

Structures known a priori
Although databases are huge, they represent a small portion of the total chemical space
Concerns about predictive validity and redundancy (Scannell et al., 2016)

De-novo molecular design

Main goal: Traverse the chemical space more effectively (better molecules in less evaluations)

AI assisted de-novo design \(\rightarrow\) Process of automatically proposing novel chemical structures that optimally satisfy desired properties

De-novo molecular design

Generate compounds in a directed manner

Reach optimal chemical solutions in fewer steps than VS
Explore different acceptable regions of chemical space for a given objective (exploration vs. exploitation)

Recap - molecule encoding

Molecules are 3D QM objects
Encoding enables to capture certain information
Trade-off: information loss vs. complexity

Recap - property prediction

Select the model depending on the encoding information

1D: Smiles, 1-hot, descriptors, etc.
- Deterministic and Bayesian models
- Deep NNs (+ Bayesian version)
2D: Graphs (GNNs)
3D: Point clouds (Geometric NNs)

Performance measures (RMSE, \(R^2\), etc.) + assessment of probabilistic predictions

Generative and discriminative models

De-novo design is also referred to as generative chemistry

Discriminative models learn decision boundaries
Generative models model the probability distribution of each class

\(\rightarrow\) Can be instantiated to generate new examples (!)

^{_{Not the only way to obtain new compounds…}}

Requirements

Validity: Adherence to chemical principles (e.g. valency)
Uniqueness: Rate of duplicates by the model
Diversity: Scope of the chemotypes generated
Novelty: Presence of generated molecules in databases
Similarity: Similarity between generated molecules and training data
Synthetic feasibility: Lab-related synthesizability

Untargetted vs. targetted (extra metric to optimize, e.g. QED, PlogP and many more)

Generative models

Targetted generation depends on having a proper characterization of the property of interest

Property prediction models serve to define an objective function (Session 1)
- Navigate a complex search space
Gradient-based vs. gradient-free methods

Gradient-based vs. gradient-free

Gradient-based: Models that use the gradient of the objective function to perform optimization

Training requires fitting parameters using data corpuses
Usually require lots of data
E.g.: NN-based approaches (s.a. VAEs)

Gradient-free: Metaheuristic models, based on stochastic population optimization

“Rule-based” approaches
E.g.: Evolutionary algorithms

Chemical representations

Chemical representation tailored for each case depending on the data, objective and resources available

Atomic level: Encode information for each atom and bond
- E.g.: Atom-wise SMILES, graph, 3D coordinates…
Fragment level: Functional groups, substructures fixed
- E.g.: Benzene as a single group
Reaction level: Target molecule as product of reactant and reactions conditions
- E.g.: Combinations from library of reactions

Model zoo

	Atom based	Fragment based	Reaction based
Gradient free	EvoMol* GB-GA	CReM	AutoGrow4
Gradient based	ChemVAE* EDM* PaccMannRL GraphAF	JT-VAE	DoG

^{_{Many (many) more… VLS3D list of resources}}

Gradient based models

ChemVAE

Originally introduced in Bombarelli et al. (2018)

Combines a Variational Autoencoder and a property predictor
Meaningful and contiuous latent space
Uses Bayesian optimization to efficiently explore the latent space
Led to expansion of VAEs in molecular design

^{_{Extension of the ideas from Generative Adversarial Networks (GANs) and autoencoders}}

Autoencoder

AE: Hourglass-structured NN that encodes and decodes the input information, consisting on an encoder, \(f_\theta(x)\), decoder, \(g_\phi(z)\), and the latent space, \(z\)

Attempts to learn the identity function, i.e. \[ \text{VAE} = g_\phi \circ f_\theta \, \quad s.t. \quad \text{VAE}^*(x) = g_\phi(f_\theta(x)) = x \]

Autoencoder

Encoder: Maps the input to the latent space \(f_\theta(x) = z\)
Decoder: Maps latent space to original space \(g_\phi(z) = \hat{x}\)
Latent space: Low-dimensional representation of \(x\) (\(z\))

Minimize the reconstruction error (\(\epsilon\)): \[ \arg \min_{\theta,\phi} \epsilon(x, \hat{x}) \]

\(\hat{x} \simeq x\) \(\Rightarrow\) Model encodes/decodes correctly

Autoencoder

AE can be seen as generative models

Latent space difficult to navigate

Variational Autoencoders

VAE: Adds stochasticity to the encoding \(\rightarrow\) Regularize latent space

Instead of encoding to a point, do it to a distribution \(p(z|x)\)
Sample from the distribution \(z \sim p(z|x)\) and decompose

\[ Loss = \epsilon(x,\hat{x}) + regularizer \]

The regularization forces latent encoding to ressemble a prior: \[ p(z) = \mathcal{N}(0, I) \]

Variational Autoencoders

The encoded data will follow \[ z \sim q(z|x) = \mathcal{N}(\mu_x, \sigma_x) \]

where \(\mu_x\) and \(\sigma_x\) are given by \(f_\theta(x)\), which can be seen as \[ \mu_x = f^1_\theta(x), \quad \sigma_x = f^2_\theta(x) \] being \(f^1\) and \(f^2\) the first and second half of the units of the latent layer

Variational Autoencoders

KL divergence as regularizer (closed form solution) \[ KL(q(z|x)|p(z)) = \sum_{i=1}^n (\sigma_{x,i})^2 + (\mu_{x,i})^2 - log(\sigma_{x,i})-1 \]

Adding noise, we sample from the latent space and decode it

ChemVAE

ChemVAE: VAE + property predictor

\[ \mathcal{L}_{\text{VAE}} = \epsilon(x, \hat{x}) + KL(q(z|x)|p(z)) + \mathcal{L}_P(x,\hat{x}) \] with \(\mathcal{L}_P(x,\hat{x})\) the property prediction error

Train all elements together
Sort the latent space to encode the property information
Bayesian optimization to move in latent space
- Assume local and smooth behavior

ChemVAE

Fig: (a) ChemVAE architecture (b) Property optimization via BO

ChemVAE - Latent space

Local behavior + interpolation between compounds possible

ChemVAE - Latent space

Property prediction crucial for meaningful latent space

ChemVAE - Comments

Lead to many other VAE-based methods
- E.g. GraphVAE, MolGAN
Generates compounds that are hard to synthesize
Latent space with very low validity (SMILES encoding)
- Use SELFIES encoding

ChemVAE - Hands on!

generative_models/ variational_autoencoder/ VAE.ipynb

^{_{Only a brief introduction though… Check the original repo for extended functionality}}

Other models

VAE-based: Recent interest in using reinforcement learning

PaccMannRL: RL-based approach using 2 VAEs
- Used for SARS-CoV-2 drug discovery (paper)

Diffusion models

EDM: Equivariant diffusion model for 3D molecule generation

Use a diffusion process instead of a VAE
\(E(3)\) symmetries: rotation, traslation and reflections

The same principle behind Stable Diffusion

Diffusion models - diffusion process

Diffusion model learns denoising processes (opposite of a diffusion process)

\(\rightarrow\) progressively add Gaussian noise (\(z_t\)) to signal (\(x\)) \[ q(z_t|x) = \mathcal{N}(z_t|\alpha_tx_t, \sigma_t^2I) \] with \(\alpha_0 \approx 1\) and \(\alpha_T \approx 0\) and \(\sigma_t\) the added noise level

Diffusion models - diffusion process

The diffusion process is Markovian with transition distribution \[ q(z_t|z_s) = \mathcal{N}(z_t|\alpha_{t|s}z_s, \sigma_{t|s}^2I)\,, \quad \forall t>s \] with \(\alpha_{t|s} = \alpha_t/\alpha_s\) and \(\sigma_{t|s}^2 = \sigma_t^2 - \alpha_{t|s}^2\sigma_s^2\)

The complete process can be given by: \[ \begin{gathered} q(z_0, z_1, \cdots, z_T|x) = q(z_0|x) \textstyle{\prod_{t=1}^T} q(z_t|z_{t-1}) \\ q(z_s|x, z_t) = \mathcal{N}(z_s|\mu_{t \rightarrow s}(x, z_t), \,\sigma_{t \rightarrow s}^2I) \end{gathered} \] with \(\mu_{t \rightarrow s}(x, z_t)\) and \(\sigma_{t \rightarrow s}^2\) in terms of \(\alpha\)’s , \(\sigma\)’s, \(x\) and \(z\)

EDM

We know the distribution of the diffusion process at each \(t\)
- Noise applied to atom types and other properties (\(h\)) using their encodings
Generative process: \(\hat{x} = \phi(z_t, t)\) (denoising \(z_t\))
- \(\phi\) is an \(E(3)\) equivariant graph NN (session 1)
We undo the path step-by-step minimizing \[ \textstyle{\sum_{t=1}^T} E_{\epsilon_t \sim \mathcal{N}_{xh}(0, I)} \left[ \textstyle{\frac{1}{2}} w(t) ||\epsilon_t - \hat{\epsilon}_t||^2 \right] \] with \(\hat{\epsilon}_t = \phi(z_t,t)\), \(\epsilon_t\) the \(t\)-step diff. and \(w(t)\) via \(\alpha_t\) and \(\sigma_t^2\)

EDM - Overview

Similar to VAE approach, but now only decoding and latent space as pure noise

EDM - Computations

EDM - Conditional generation

EDM performs property optimization with a simple extension of \(\phi\) into \(\phi(z_t, [t, c])\), with \(c\) a property of interest

Molecules with increasing polarizability (\(\alpha\)), given above

EDM - Hands on!

generative_models/ diffusion/DIFFUSION.ipynb

Gradient free models

Evolutionary algorithms

Key idea:

Population of individuals (states) in which the fittest (highest valued state) produce offspring (successor states) that populate the next generation in a process of recombination and mutation.

Evolutionary algorithms

Many different evolutionary algorithms, they mostly vary on their setup regarding common criteria:

Population size
Representation of each individual
- Strings (s.a. ATGC for genes), sequences of real numbers (evolution strategies) or even computer programs (genetic programming)
Mixing number (\(\rho\)): number of parents that form offspring (commonly 2, stochastic beam search \(\rho = 1\))

Evolutionary algorithms

Many different evolutionary algorithms, they mostly vary on their setup regarding common criteria:

Selection process: Select parents for the next generation.
Different options:
- Select from all individuals with probability proportional to their fitness score.
- Randomly select \(n\) individuals (\(n > ρ\)), and then select the ρ most fit ones as parents.
- (many more)

Evolutionary algorithms

Many different evolutionary algorithms, they mostly vary on their setup regarding common criteria:

Recombination procedure
- E.g. \(\rho = 2\), select random crossover point to recombine two parents into two children

Evolutionary algorithms

Many different evolutionary algorithms, they mostly vary on their setup regarding common criteria:

Mutation rate, or how often offspring have random mutations representation
Next generation makeup:
- Just the new offspring
- Include a few top-scoring parents from the previous generation (elitism)
- Culling (individuals below a given threshold are discarded)

Evolutionary algorithms

Example: (a) Rank population by fitness levels (b), resulting in pairs (c) from mating and producing offspring (d) which are subject to mutations (e)

Evolutionary algorithms

Previous case:

Child gets the first three digits from the \(1^{st}\) parent (327) and the remaining five from the \(2^{nd}\) parent (48552)
(no mutation here)

Evolutionary algorithms

Schema: Structure in which some positions are left unspecified

Instances: Strings that match the schema
Example: 327\(^{*****}\) (all instances beggining with 3, 2, and 7)
Useful to maintain an interesting piece in evolutionary process

EvoMol

EvoMol - Impletation

EvoMol - Hands on!

generative_models/ evolutionary_algorithm/GENETIC.ipynb