Why bother?
%load_ext autoreload
%autoreload 2
One featurizer to rule them all?¶
Contrary to many other machine learning domains, molecular featurization (i.e. the process of transforming a molecule into a vector) lacks a good default. It remains unclear how we can effectively capture the richness of molecular data in a unified representation and what works best heavily depends on the nature and constraints of the task you are trying to model. It is therefore good practice to try different featurization schemes: From structural fingerprints, to physico-chemical descriptors and pre-trained embeddings.
Don't take our word for it¶
To demonstrate the impact a featurizer can have, we setup two simple benchmarks.
- To demonstrate the impact on modeling, we will use two datasets from MoleculeNet.
- To demonstrate the impact on search, we will use the RDKit Benchmarking Platform.
We will compare the performance of three different featurizers:
- ECFP6 [1]: Binary, circular fingerprints where each bit indicates the presence of particular substructures of a radius up to 3 bonds away from an atom.
- Mordred [2]: Continuous descriptors with more than 1800 2D and 3D descriptors.
- ChemBERTa [3]: Learned representations from a pre-trained SMILES transformer model.
Modeling¶
We will compare the performance on two datasets using scikit-learn AutoML [4, 5] models.
import os
import numpy as np
import pandas as pd
import datamol as dm
import autosklearn.classification
import autosklearn.regression
from sklearn.metrics import mean_absolute_error, roc_auc_score
from sklearn.model_selection import GroupShuffleSplit
from rdkit.Chem import SaltRemover
from molfeat.trans.fp import FPVecTransformer
from molfeat.trans.pretrained.hf_transformers import PretrainedHFTransformer
def load_dataset(uri: str, readout_col: str):
"""Loads the MoleculeNet dataset"""
df = pd.read_csv(uri)
smiles = df["smiles"].values
y = df[readout_col].values
return smiles, y
def preprocess_smiles(smi):
"""Preprocesses the SMILES string"""
with dm.without_rdkit_log():
mol = dm.to_mol(smi, ordered=True, sanitize=False)
mol = dm.sanitize_mol(mol)
if mol is None:
return
mol = dm.standardize_mol(mol, disconnect_metals=True)
remover = SaltRemover.SaltRemover()
mol = remover.StripMol(mol, dontRemoveEverything=True)
return dm.to_smiles(mol)
def scaffold_split(smiles):
"""In line with common practice, we will use the scaffold split to evaluate our models"""
scaffolds = [dm.to_smiles(dm.to_scaffold_murcko(dm.to_mol(smi))) for smi in smiles]
splitter = GroupShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
return next(splitter.split(smiles, groups=scaffolds))
# Setup the featurizers
trans_ecfp = FPVecTransformer(kind="ecfp:6", n_jobs=-1)
trans_mordred = FPVecTransformer(kind="mordred", replace_nan=True, n_jobs=-1)
trans_chemberta = PretrainedHFTransformer(kind='ChemBERTa-77M-MLM', notation='smiles')
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/google/auth/_default.py:78: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. See the following page for troubleshooting: https://cloud.google.com/docs/authentication/adc-troubleshooting/user-creds. warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
Lipophilicity¶
Lipophilicity is a regression task with 4200 molecules
# Prepare the Lipophilicity dataset
smiles, y_true = load_dataset("https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/Lipophilicity.csv", "exp")
smiles = np.array([preprocess_smiles(smi) for smi in smiles])
smiles = np.array([smi for smi in smiles if smi != ""])
X = {
"ECFP": trans_ecfp(smiles),
"Mordred": trans_mordred(smiles),
"ChemBERTa": trans_chemberta(smiles),
}
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce return ufunc.reduce(obj, axis, dtype, out, **passkwargs) /home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py:700: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak. warnings.warn( /home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce return ufunc.reduce(obj, axis, dtype, out, **passkwargs) /home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce return ufunc.reduce(obj, axis, dtype, out, **passkwargs) /home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce return ufunc.reduce(obj, axis, dtype, out, **passkwargs) /home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce return ufunc.reduce(obj, axis, dtype, out, **passkwargs) /home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce return ufunc.reduce(obj, axis, dtype, out, **passkwargs) /home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce return ufunc.reduce(obj, axis, dtype, out, **passkwargs) /home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
0%| | 0/4200 [00:00<?, ?it/s]
0%| | 0/4200 [00:00<?, ?it/s]
# To make the output less verbose:
os.environ["TOKENIZERS_PARALLELISM"] = "false"
# Train a model
train_ind, test_ind = scaffold_split(smiles)
scores = {}
for name, feats in X.items():
# Train
automl = autosklearn.regression.AutoSklearnRegressor(
memory_limit=24576,
time_left_for_this_task=360,
n_jobs=1
)
automl.fit(feats[train_ind], y_true[train_ind])
# Predict and evaluate
y_hat = automl.predict(feats[test_ind])
# Evaluate
mae = mean_absolute_error(y_true[test_ind], y_hat)
scores[name] = mae
scores
[WARNING] [2023-03-21 09:43:37,219:Client-EnsembleBuilder] No runs were available to build an ensemble from [WARNING] [2023-03-21 09:43:52,005:Client-EnsembleBuilder] No runs were available to build an ensemble from [WARNING] [2023-03-21 09:43:53,508:Client-EnsembleBuilder] No runs were available to build an ensemble from [WARNING] [2023-03-21 09:49:31,814:Client-EnsembleBuilder] No runs were available to build an ensemble from [WARNING] [2023-03-21 09:49:35,671:Client-EnsembleBuilder] No runs were available to build an ensemble from [WARNING] [2023-03-21 09:49:45,916:Client-EnsembleBuilder] No runs were available to build an ensemble from [WARNING] [2023-03-21 09:55:25,854:Client-EnsembleBuilder] No runs were available to build an ensemble from [WARNING] [2023-03-21 09:55:31,098:Client-EnsembleBuilder] No runs were available to build an ensemble from [WARNING] [2023-03-21 09:56:08,207:Client-EnsembleBuilder] No runs were available to build an ensemble from
{'ECFP': 0.6889895591995786,
'Mordred': 0.5481806419968572,
'ChemBERTa': 0.7432117051810577}
ClinTox¶
# Prepare the ClinTox dataset
smiles, y_true = load_dataset("https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/clintox.csv.gz", "CT_TOX")
smiles = np.array([preprocess_smiles(smi) for smi in smiles])
smiles = np.array([smi for smi in smiles if smi is not None])
X = {
"ECFP": trans_ecfp(smiles),
"Mordred": trans_mordred(smiles),
"ChemBERTa": trans_chemberta(smiles),
}
--- Logging error ---
Traceback (most recent call last):
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/datamol/_sanifix4.py", line 118, in sanifix
Chem.SanitizeMol(cp)
rdkit.Chem.rdchem.AtomValenceException: Explicit valence for atom # 0 N, 5, is greater than permitted
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 1083, in emit
msg = self.format(record)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 927, in format
return fmt.format(record)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 663, in format
record.message = record.getMessage()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 367, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel_launcher.py", line 17, in <module>
app.launch_new_instance()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/traitlets/config/application.py", line 1043, in launch_instance
app.start()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelapp.py", line 725, in start
self.io_loop.start()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/tornado/platform/asyncio.py", line 215, in start
self.asyncio_loop.run_forever()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
handle._run()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 513, in dispatch_queue
await self.process_one()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 502, in process_one
await dispatch(*args)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 409, in dispatch_shell
await result
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 729, in execute_request
reply_content = await reply_content
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/ipkernel.py", line 422, in do_execute
res = shell.run_cell(
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/zmqshell.py", line 540, in run_cell
return super().run_cell(*args, **kwargs)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 2961, in run_cell
result = self._run_cell(
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3016, in _run_cell
result = runner(coro)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
coro.send(None)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3221, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3400, in run_ast_nodes
if await self.run_code(code, result, async_=asy):
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3460, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_11612/2689847368.py", line 3, in <module>
smiles = np.array([preprocess_smiles(smi) for smi in smiles])
File "/tmp/ipykernel_11612/2689847368.py", line 3, in <listcomp>
smiles = np.array([preprocess_smiles(smi) for smi in smiles])
File "/tmp/ipykernel_11612/2436713256.py", line 13, in preprocess_smiles
mol = dm.sanitize_mol(mol)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/datamol/mol.py", line 323, in sanitize_mol
mol = _sanifix4.sanifix(mol)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/datamol/_sanifix4.py", line 121, in sanifix
logging.debug(e, Chem.MolToSmiles(m))
Message: AtomValenceException('Explicit valence for atom # 0 N, 5, is greater than permitted')
Arguments: ('[NH4][Pt]([NH4])(Cl)Cl',)
--- Logging error ---
Traceback (most recent call last):
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/datamol/_sanifix4.py", line 118, in sanifix
Chem.SanitizeMol(cp)
rdkit.Chem.rdchem.KekulizeException: Can't kekulize mol. Unkekulized atoms: 21
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 1083, in emit
msg = self.format(record)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 927, in format
return fmt.format(record)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 663, in format
record.message = record.getMessage()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 367, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel_launcher.py", line 17, in <module>
app.launch_new_instance()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/traitlets/config/application.py", line 1043, in launch_instance
app.start()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelapp.py", line 725, in start
self.io_loop.start()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/tornado/platform/asyncio.py", line 215, in start
self.asyncio_loop.run_forever()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
handle._run()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 513, in dispatch_queue
await self.process_one()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 502, in process_one
await dispatch(*args)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 409, in dispatch_shell
await result
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 729, in execute_request
reply_content = await reply_content
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/ipkernel.py", line 422, in do_execute
res = shell.run_cell(
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/zmqshell.py", line 540, in run_cell
return super().run_cell(*args, **kwargs)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 2961, in run_cell
result = self._run_cell(
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3016, in _run_cell
result = runner(coro)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
coro.send(None)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3221, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3400, in run_ast_nodes
if await self.run_code(code, result, async_=asy):
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3460, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_11612/2689847368.py", line 3, in <module>
smiles = np.array([preprocess_smiles(smi) for smi in smiles])
File "/tmp/ipykernel_11612/2689847368.py", line 3, in <listcomp>
smiles = np.array([preprocess_smiles(smi) for smi in smiles])
File "/tmp/ipykernel_11612/2436713256.py", line 13, in preprocess_smiles
mol = dm.sanitize_mol(mol)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/datamol/mol.py", line 323, in sanitize_mol
mol = _sanifix4.sanifix(mol)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/datamol/_sanifix4.py", line 121, in sanifix
logging.debug(e, Chem.MolToSmiles(m))
Message: KekulizeException("Can't kekulize mol. Unkekulized atoms: 21")
Arguments: ('O=c1c(CCS(=O)c2ccccc2)c(=O)n(c2ccccc2)n1c1ccccc1',)
--- Logging error ---
Traceback (most recent call last):
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/datamol/_sanifix4.py", line 118, in sanifix
Chem.SanitizeMol(cp)
rdkit.Chem.rdchem.AtomValenceException: Explicit valence for atom # 81 N, 4, is greater than permitted
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 1083, in emit
msg = self.format(record)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 927, in format
return fmt.format(record)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 663, in format
record.message = record.getMessage()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 367, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel_launcher.py", line 17, in <module>
app.launch_new_instance()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/traitlets/config/application.py", line 1043, in launch_instance
app.start()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelapp.py", line 725, in start
self.io_loop.start()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/tornado/platform/asyncio.py", line 215, in start
self.asyncio_loop.run_forever()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
handle._run()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 513, in dispatch_queue
await self.process_one()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 502, in process_one
await dispatch(*args)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 409, in dispatch_shell
await result
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 729, in execute_request
reply_content = await reply_content
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/ipkernel.py", line 422, in do_execute
res = shell.run_cell(
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/zmqshell.py", line 540, in run_cell
return super().run_cell(*args, **kwargs)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 2961, in run_cell
result = self._run_cell(
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3016, in _run_cell
result = runner(coro)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
coro.send(None)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3221, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3400, in run_ast_nodes
if await self.run_code(code, result, async_=asy):
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3460, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_11612/2689847368.py", line 3, in <module>
smiles = np.array([preprocess_smiles(smi) for smi in smiles])
File "/tmp/ipykernel_11612/2689847368.py", line 3, in <listcomp>
smiles = np.array([preprocess_smiles(smi) for smi in smiles])
File "/tmp/ipykernel_11612/2436713256.py", line 13, in preprocess_smiles
mol = dm.sanitize_mol(mol)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/datamol/mol.py", line 323, in sanitize_mol
mol = _sanifix4.sanifix(mol)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/datamol/_sanifix4.py", line 121, in sanifix
logging.debug(e, Chem.MolToSmiles(m))
Message: AtomValenceException('Explicit valence for atom # 81 N, 4, is greater than permitted')
Arguments: ('CC1=C2N3[C@@H]4[C@H](CC(N)=O)[C@@]2(C)CCC(=O)NC[C@@H](C)OP(=O)([O-])O[C@H]2[C@@H](O)[C@H](O[C@@H]2CO)N2C=N(c5cc(C)c(C)cc52)[Co+]325(O)N3=C1[C@@H](CCC(N)=O)C(C)(C)C3=CC1=N2C(=C(C)C2=N5[C@]4(C)[C@@](C)(CC(N)=O)[C@@H]2CCC(N)=O)[C@@](C)(CC(N)=O)[C@@H]1CCC(N)=O',)
--- Logging error ---
Traceback (most recent call last):
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/datamol/_sanifix4.py", line 118, in sanifix
Chem.SanitizeMol(cp)
rdkit.Chem.rdchem.AtomValenceException: Explicit valence for atom # 82 N, 4, is greater than permitted
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 1083, in emit
msg = self.format(record)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 927, in format
return fmt.format(record)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 663, in format
record.message = record.getMessage()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 367, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel_launcher.py", line 17, in <module>
app.launch_new_instance()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/traitlets/config/application.py", line 1043, in launch_instance
app.start()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelapp.py", line 725, in start
self.io_loop.start()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/tornado/platform/asyncio.py", line 215, in start
self.asyncio_loop.run_forever()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
handle._run()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 513, in dispatch_queue
await self.process_one()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 502, in process_one
await dispatch(*args)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 409, in dispatch_shell
await result
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 729, in execute_request
reply_content = await reply_content
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/ipkernel.py", line 422, in do_execute
res = shell.run_cell(
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/zmqshell.py", line 540, in run_cell
return super().run_cell(*args, **kwargs)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 2961, in run_cell
result = self._run_cell(
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3016, in _run_cell
result = runner(coro)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
coro.send(None)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3221, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3400, in run_ast_nodes
if await self.run_code(code, result, async_=asy):
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3460, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_11612/2689847368.py", line 3, in <module>
smiles = np.array([preprocess_smiles(smi) for smi in smiles])
File "/tmp/ipykernel_11612/2689847368.py", line 3, in <listcomp>
smiles = np.array([preprocess_smiles(smi) for smi in smiles])
File "/tmp/ipykernel_11612/2436713256.py", line 13, in preprocess_smiles
mol = dm.sanitize_mol(mol)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/datamol/mol.py", line 323, in sanitize_mol
mol = _sanifix4.sanifix(mol)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/datamol/_sanifix4.py", line 121, in sanifix
logging.debug(e, Chem.MolToSmiles(m))
Message: AtomValenceException('Explicit valence for atom # 82 N, 4, is greater than permitted')
Arguments: ('CC1=C2N3[C@@H]4[C@H](CC(N)=O)[C@@]2(C)CCC(=O)NC[C@@H](C)OP(=O)(O)O[C@H]2[C@@H](O)[C@H](O[C@@H]2CO)N2C=N(c5cc(C)c(C)cc52)[Co]325(C#N)N3=C1[C@@H](CCC(N)=O)C(C)(C)C3=CC1=N2C(=C(C)C2=N5[C@]4(C)[C@@](C)(CC(N)=O)[C@@H]2CCC(N)=O)[C@@](C)(CC(N)=O)[C@@H]1CCC(N)=O',)
--- Logging error ---
Traceback (most recent call last):
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/datamol/_sanifix4.py", line 118, in sanifix
Chem.SanitizeMol(cp)
rdkit.Chem.rdchem.KekulizeException: Can't kekulize mol. Unkekulized atoms: 17
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 1083, in emit
msg = self.format(record)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 927, in format
return fmt.format(record)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 663, in format
record.message = record.getMessage()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 367, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel_launcher.py", line 17, in <module>
app.launch_new_instance()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/traitlets/config/application.py", line 1043, in launch_instance
app.start()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelapp.py", line 725, in start
self.io_loop.start()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/tornado/platform/asyncio.py", line 215, in start
self.asyncio_loop.run_forever()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
handle._run()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 513, in dispatch_queue
await self.process_one()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 502, in process_one
await dispatch(*args)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 409, in dispatch_shell
await result
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 729, in execute_request
reply_content = await reply_content
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/ipkernel.py", line 422, in do_execute
res = shell.run_cell(
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/zmqshell.py", line 540, in run_cell
return super().run_cell(*args, **kwargs)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 2961, in run_cell
result = self._run_cell(
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3016, in _run_cell
result = runner(coro)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
coro.send(None)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3221, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3400, in run_ast_nodes
if await self.run_code(code, result, async_=asy):
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3460, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_11612/2689847368.py", line 3, in <module>
smiles = np.array([preprocess_smiles(smi) for smi in smiles])
File "/tmp/ipykernel_11612/2689847368.py", line 3, in <listcomp>
smiles = np.array([preprocess_smiles(smi) for smi in smiles])
File "/tmp/ipykernel_11612/2436713256.py", line 13, in preprocess_smiles
mol = dm.sanitize_mol(mol)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/datamol/mol.py", line 323, in sanitize_mol
mol = _sanifix4.sanifix(mol)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/datamol/_sanifix4.py", line 121, in sanifix
logging.debug(e, Chem.MolToSmiles(m))
Message: KekulizeException("Can't kekulize mol. Unkekulized atoms: 17")
Arguments: ('CCCCc1c(=O)n(c2ccccc2)n(c2ccc(O)cc2)c1=O',)
--- Logging error ---
Traceback (most recent call last):
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/datamol/_sanifix4.py", line 118, in sanifix
Chem.SanitizeMol(cp)
rdkit.Chem.rdchem.KekulizeException: Can't kekulize mol. Unkekulized atoms: 16
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 1083, in emit
msg = self.format(record)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 927, in format
return fmt.format(record)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 663, in format
record.message = record.getMessage()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/logging/__init__.py", line 367, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel_launcher.py", line 17, in <module>
app.launch_new_instance()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/traitlets/config/application.py", line 1043, in launch_instance
app.start()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelapp.py", line 725, in start
self.io_loop.start()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/tornado/platform/asyncio.py", line 215, in start
self.asyncio_loop.run_forever()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
self._run_once()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
handle._run()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 513, in dispatch_queue
await self.process_one()
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 502, in process_one
await dispatch(*args)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 409, in dispatch_shell
await result
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 729, in execute_request
reply_content = await reply_content
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/ipkernel.py", line 422, in do_execute
res = shell.run_cell(
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/ipykernel/zmqshell.py", line 540, in run_cell
return super().run_cell(*args, **kwargs)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 2961, in run_cell
result = self._run_cell(
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3016, in _run_cell
result = runner(coro)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
coro.send(None)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3221, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3400, in run_ast_nodes
if await self.run_code(code, result, async_=asy):
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3460, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_11612/2689847368.py", line 3, in <module>
smiles = np.array([preprocess_smiles(smi) for smi in smiles])
File "/tmp/ipykernel_11612/2689847368.py", line 3, in <listcomp>
smiles = np.array([preprocess_smiles(smi) for smi in smiles])
File "/tmp/ipykernel_11612/2436713256.py", line 13, in preprocess_smiles
mol = dm.sanitize_mol(mol)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/datamol/mol.py", line 323, in sanitize_mol
mol = _sanifix4.sanifix(mol)
File "/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/datamol/_sanifix4.py", line 121, in sanifix
logging.debug(e, Chem.MolToSmiles(m))
Message: KekulizeException("Can't kekulize mol. Unkekulized atoms: 16")
Arguments: ('CCCCc1c(=O)n(c2ccccc2)n(c2ccccc2)c1=O',)
[10:36:46] Unusual charge on atom 0 number of radical electrons set to zero
[10:36:48] Unusual charge on atom 0 number of radical electrons set to zero
[10:36:48] Unusual charge on atom 0 number of radical electrons set to zero
[10:36:48] Unusual charge on atom 0 number of radical electrons set to zero
[10:36:48] Unusual charge on atom 0 number of radical electrons set to zero
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/joblib/externals/loky/process_executor.py:700: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
warnings.warn(
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/cas/local/conda/envs/molfeat-benchmark/lib/python3.9/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
0%| | 0/1478 [00:00<?, ?it/s]
0%| | 0/1478 [00:00<?, ?it/s]
# To make the output less verbose:
os.environ["TOKENIZERS_PARALLELISM"] = "false"
# Train a model
train_ind, test_ind = scaffold_split(smiles)
scores = {}
for name, feats in X.items():
# Train
automl = autosklearn.classification.AutoSklearnClassifier(
memory_limit=24576,
time_left_for_this_task=360,
n_jobs=1
)
automl.fit(feats[train_ind], y_true[train_ind])
# Predict and evaluate
y_hat = automl.predict_proba(feats[test_ind])
y_hat = np.max(y_hat, axis=-1)
# Evaluate
auroc = roc_auc_score(y_true[test_ind], y_hat)
scores[name] = auroc
scores
[WARNING] [2023-03-21 10:49:24,650:Client-EnsembleBuilder] No models better than random - using Dummy losses! Models besides current dummy model: 0 Dummy models: 1 [WARNING] [2023-03-21 10:49:40,878:Client-EnsembleBuilder] No models better than random - using Dummy losses! Models besides current dummy model: 0 Dummy models: 1
{'ECFP': 0.47138888888888886,
'Mordred': 0.4252777777777778,
'ChemBERTa': 0.3705555555555555}
Conclusion¶
We can see that for Lipophilicity, the Mordred featurizer proves most powerful, outperforming the next best featurizer by about 20%. For ClinTox, however, the tables have turned and it is instead ECFP that outperforms Mordred by about 10%.
This shows the importance of trying different featurizers. Luckily, with Molfeat, this has just become a lot easier to do!
Search¶
We will evaluate the performance on the search task using
# TODO
Citations¶
- Rogers, D., & Hahn, M. (2010). Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5), 742-754.
- Moriwaki, H., Tian, Y. S., Kawashita, N., & Takagi, T. (2018). Mordred: a molecular descriptor calculator. Journal of cheminformatics, 10(1), 1-14.
- Chithrananda, S., Grand, G., & Ramsundar, B. (2020). Chemberta: Large-scale self-supervised pretraining for molecular property prediction. arXiv preprint arXiv:2010.09885.
- Efficient and Robust Automated Machine Learning Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum and Frank Hutter Advances in Neural Information Processing Systems 28 (2015)
- Auto-Sklearn 2.0: The Next Generation Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer and Frank Hutter* arXiv:2007.04074 [cs.LG], 2020