Skip to content

Commit 3a6133d

Browse files
authored
Updating execution scripts (#14)
* feat: training script supports finetuning and asymetric sampling * wip: updated evaluation script * doc: Update README * ci: bump ubuntu
1 parent b497a67 commit 3a6133d

File tree

6 files changed

+287
-575
lines changed

6 files changed

+287
-575
lines changed

.github/workflows/build.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ jobs:
1010
fail-fast: false
1111
matrix:
1212
include:
13-
- os: ubuntu-18.04
13+
- os: ubuntu-20.04
1414
pip_cache_path: ~/.cache/pip
1515
experimental: false
1616
- os: macos-latest

README.md

+51-22
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,40 @@
33
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
44
[![Gradio demo](https://img.shields.io/website-up-down-green-red/https/hf.space/gradioiframe/GT4SD/molecular_properties/+.svg?label=demo%20status)](https://huggingface.co/spaces/GT4SD/molecular_properties)
55

6-
# Chemical Representation Learning for Toxicity Prediction
7-
6+
## Chemical Representation Learning for Toxicity Prediction
87

98
PyTorch implementation related to the paper *Chemical Representation Learning for Toxicity Prediction* ([Born et al, 2023, *Digital Discovery*](https://pubs.rsc.org/en/content/articlehtml/2023/dd/d2dd00099g)).
10-
## Training your own model
119

12-
The library itself has few dependencies (see [setup.py](setup.py)) with loose requirements.
10+
# Inference
11+
We released pretrained models for the Tox21, the ClinTox and the SIDER dataset.
12+
13+
## Demo with UI
14+
🤗 A gradio demo with a simple UI is available on [HuggingFace spaces](https://huggingface.co/spaces/GT4SD/molecular_properties)
15+
![Summary](assets/demo.png)
16+
17+
## Python API
18+
The pretrained models are available via the [GT4SD](https://github.com/GT4SD), the Generative Toolkit for Scientific Discovery. See the paper [here](https://arxiv.org/abs/2207.03928).
19+
We recommend to use [GT4SD](https://github.com/GT4SD/gt4sd-core) for inference. Once you install that library, use as follows:
20+
```py
21+
from gt4sd.properties import PropertyPredictorRegistry
22+
tox21 = PropertyPredictorRegistry.get_property_predictor('tox21', {'algorithm_version': 'v0'})
23+
tox21('CCO')
24+
```
25+
26+
The other models are the SIDER model and the ClinTox model from the [MoleculeNet](https://moleculenet.org/datasets-1) benchmark:
27+
```py
28+
from gt4sd.properties import PropertyPredictorRegistry
29+
sider = PropertyPredictorRegistry.get_property_predictor('sider', {'algorithm_version': 'v0'})
30+
clintox = PropertyPredictorRegistry.get_property_predictor('clintox', {'algorithm_version': 'v0'})
31+
print(f"SIDE effect predictions: {sider('CCO')}")
32+
print(f"Clinical toxicitiy predictions: {clintox('CCO')}")
33+
```
34+
35+
36+
# Training your own model
1337

1438
### Setup
39+
The library itself has few dependencies (see [setup.py](setup.py)) with loose requirements.
1540
```sh
1641
conda env create -f conda.yml
1742
conda activate toxsmi
@@ -26,31 +51,35 @@ Download sample data from the Tox21 database and store it in a folder called `da
2651
[here](https://ibm.box.com/s/kahxnlg2k2s0x3z0r5fa6y67tmfhs6or).
2752

2853
```console
29-
(toxsmi) $ python3 scripts/train_tox.py data/tox21_train.csv \
30-
data/tox21_score.csv data/tox21.smi data/smiles_language_tox21.pkl \
31-
models params/mca.json test --embedding_path data/smiles_vae_embeddings.pkl
54+
(toxsmi) $ python3 scripts/train_tox.py \
55+
--train data/tox21_train.csv \
56+
--test data/tox21_score.csv \
57+
--smi data/tox21.smi \
58+
--params params/mca.json \
59+
--model path_to_model_folder \
60+
--name debug
3261
```
3362

63+
**Features**:
64+
- Set ```--finetune``` to the path to a `.pt` file to start from a pretrained model
65+
- Set ```--embedding_path``` to the path of pretrained embeddings
66+
3467
Type `python scripts/train_tox.py -h` for further help.
3568

36-
## Inference (using our pretrained models)
37-
Several of our trained models are available via the [GT4SD](https://github.com/GT4SD), the Generative Toolkit for Scientific Discovery. See the paper [here](https://arxiv.org/abs/2207.03928).
38-
We recommend to use [GT4SD](https://github.com/GT4SD/gt4sd-core) for inference. Once you install that library, use as follows:
39-
```py
40-
from gt4sd.properties import PropertyPredictorRegistry
41-
tox21 = PropertyPredictorRegistry.get_property_predictor('tox21', {'algorithm_version': 'v0'})
42-
tox21('CCO')
43-
```
69+
### Evaluate a model
70+
In the `scripts` directory is an evaluation script [eval_tox.py](./scripts/eval_tox.py).
71+
Assume you have a trained model, use as follows:
4472

45-
The other models are the SIDER model and the ClinTox model from the [MoleculeNet](https://moleculenet.org/datasets-1) benchmark:
46-
```py
47-
from gt4sd.properties import PropertyPredictorRegistry
48-
sider = PropertyPredictorRegistry.get_property_predictor('sider', {'algorithm_version': 'v0'})
49-
clintox = PropertyPredictorRegistry.get_property_predictor('clintox', {'algorithm_version': 'v0'})
50-
print(f"SIDE effect predictions: {sider('CCO')}")
51-
print(f"Clinical toxicitiy predictions: {clintox('CCO')}")
73+
```console
74+
(toxsmi) $ python3 scripts/eval_tox.py \
75+
-model path_to_model_folder \
76+
-smi data/tox21.smi \
77+
-labels data/tox21_test.csv \
78+
-checkpoint RMSE"
5279
```
5380

81+
where `-checkpoint` specifies which `.pt` file to pick for the evaluation (based on substring matching)
82+
5483
## Attention visualization
5584
The model uses a self-attention mechanism that can highlight chemical motifs used for the predictions.
5685
In [notebooks/toxicity_attention.ipynb](notebooks/toxicity_attention.ipynb) we share a tutorial on how to create such plots:

assets/demo.png

392 KB
Loading

0 commit comments

Comments
 (0)