GraphBRAIN

predicts the permeability of molecules through the blood-brain barrier

Demo Video: here

Inspiration💡

The blood-brain barrier is a protective layer that separates the brain from the rest of the body’s circulatory system. This barrier is highly selective and prevents solutes in the circulating blood from non-selectively crossing into the extracellular fluid of the central nervous system where neurons reside.

To be effective as therapeutic agents, centrally-acting drugs (central to the brain) must cross the blood-brain barrier. Conversely, to be devoid of unwanted central nervous system effects, peripherally acting drugs (peripherally to the brain) must show limited ability to cross the blood-brain barrier. In both cases, the blood-brain barrier permeability of drug candidates must be known. However, the experimental determination of brain-blood partitioning is difficult, time-consuming, and expensive and not suitable to screen large collections of chemicals. A broadly applicable method for predicting the blood-brain barrier permeation of candidates at an early stage of discovery would have a great impact on drug research and development.

We built GraphBRAIN to solve this problem.

What it does🔎

A SMILES string is a representation of the molecule as an ASCII string. When you give us the SMILE string for a molecule, we will predict its permeability through the blood-brain barrier and render an interactive molecular structure in 3d. There are 4 sample buttons below the input box to help demonstrate the functionality of our website along with displaying some real world information (functionality exclusive to sample buttons).

How we built it🔨

We majorly divide this project into two sub-projects:

GraphBRAIN Machine Learning Model

The data we would need for this project are readily available as a part of MoleculeNet which contains 2,050 molecules and each molecule comes with a name, label, and SMILES string. MoleculeNet is available publicly. A SMILES string allows one to easily represent a molecule as an ASCII string. In memory, we store and manipulate any of these molecules as graphs.

Our data processing pipeline is as such:

  • We start by obtaining quite a few SMILE strings and their labels that is their blood-brain membrane permeability directly from the dataset.
  • These SMILE strings are now converted to molecules using a popular package for chemistry, ‘rdkit’. This representation contains important information about the molecule including its stereochemistry, geometry, the valency of individual atoms, and so on.
  • We then convert these molecules to a graph which is how they would be stored and performed in any computations.
  • We also do not want to lose any of the chemical information of the molecule and thus we are also interested in finding a way to encode some of the important information into the graph. Finally, each molecule is stored as a graph using 3 ragged tensors (a mathematical generalization to allow tensors to have variable lengths) representing the atoms in the molecule, the bonds in the molecule, and the bond properties in the molecule.
  • We now create a TensorFlow dataset using ‘tf.data’ APIs to avoid storing all the graphs directly in memory as well as efficiently use an accelerator (the training was performed on TPUs and much of the testing on 4 x NVIDIA Tesla A100).
  • This does require us to change how the graphs are represented in memory to allow for batching so we create for each batch 1 tuple of 4 tensors representing a merge of the 3 properties for each molecule and an indicator, and 2 a tensor containing the labels.

GraphBRAIN Web App

The GraphBRAIN web app is built entirely out of Slidesmart. We utilize some interactive features such as tabs and columns to make it visually pleasing. Molveiw.org was used to obtain a 3d rendering of the molecule directly from the SMILES strings. We were able to embed it into our web app thanks to a feature provided by Streamlit.

Challenges we ran into⚠️

  • One of the major limitations we faced in training our model was the limited quantity of datasets we had at our disposal. While the quality of the datasets we used was very good, it was the quantity that limited us in training our model. The dataset we used had only data of about 2050 molecules in total.
  • Another problem we faced while implementing the UI of the website was figuring out how we could embed a 3D view of the molecule on the website. We wanted to include an interactive part to the website by integrating a 3D view of the molecule given by the user, but we faced a lot of difficulties while trying to implement this. We tried multiple software like rdkit and Indigo, but all of these implementations failed. Finally, we used MolView to implement this feature and were able to figure out how to correctly embed it in our website.
  • Also, while writing the code, we had written most of our code on a MacOS device thus the file paths and directories we used were relative to MacOS. Later, when we tried running the project on a Windows device, we realized that it was causing the website to fail because the file paths couldn’t be recognized. Thus, we had to go back and make the appropriate changes in our code so that it would be compatible with any operating system.

Accomplishments that we’re proud of🥇

  • Our project aimed to develop a machine learning model to predict the blood-brain barrier permeability of a given molecule using the BBBP dataset by MoleculeNet. Our results demonstrate that we have been successful in achieving this.
  • We found that a range of factors, such as molecular weight, number of rotatable bonds, number of hydrogen bond donors, ionizability, electronegativity, molecular size, play a role in blood-brain barrier permeation. We took into account all these factors by drawing out diagrams of the molecular structures. Keeping in mind their stereochemistry and using machine learning algorithms, we were able to develop a model which accomplishes the task with high accuracy and precision.
  • Our results provide important insights into the complex mechanisms underlying blood-brain barrier permeability and demonstrate the potential of machine learning in drug research and development. By using our model, researchers can rapidly screen large numbers of potential drug candidates and focus their efforts on those with the greater affinity to cross the blood-brain barrier.
  • We are quite happy that we were able to integrate the models with the UI and do so efficiently
  • We were also able to teach each other in this hackathon, some of us had no experience with Machine Learning while some were not able to make great UIs

What we learned🧠

  • Most of our teprojectam members were not experienced with techniques in the area of Machine Learning or modern machine learning models like the one we used.
  • All our project members are first-year students at the University of Toronto, but our combined experience helped us complete this project in the allotted time. Our belief in teamwork also increased during the course of this hackathon.

What’s next for GraphBRAIN💭

  • In this project, we have successfully developed a machine learning model to predict the blood-brain barrier permeability of small molecules. However, there are still some areas that can be further explored to improve the model’s accuracy and generalizability.
  • Firstly, although the current model achieved a relatively high accuracy on the test set, there is still room for improvement. One possible approach is to incorporate more diverse chemical descriptors, such as topological indices, pharmacophore features, and quantum chemical properties, into the feature set to capture a wider range of chemical properties that contribute to BBB permeability.
  • Additionally, the current study only used a single dataset for model training and evaluation. It would be valuable to test the model’s performance on other independent datasets with different molecular structures and properties, to evaluate the generalizability of the model.
  • Lastly, the model must go through several rounds of testing before being used for research. Improving on exceptional and edge cases will prove very useful to our model.

Contributors:

  • Shivesh Prakash
  • Rishit Dagli
  • Pranjal Agrawal

Check it out here: GitHub Repo