Synthesis
The synthesis stage evaluates whether generated molecules can be practically synthesized. It computes registry-enabled accessibility scores and optionally runs full retrosynthetic route analysis using AiZynthFinder.
Scoring Methods
SA Score (Synthetic Accessibility Score)
- Range: 1—10 (lower is better)
- Method: RDKit Contrib SA Score calculator (
RDConfig.RDContribDir/SA_Score) - How it works: Combines fragment contributions (from a frequency analysis of molecules in PubChem) with a complexity penalty. Molecules built from common fragments score low; molecules requiring unusual substructures score high.
- Default threshold: 1—4.5
SYBA Score (SYnthetic Bayesian Accessibility)
- Range: unbounded (higher is better; typically -100 to +300)
- Method: SYBA package
- How it works: A Bayesian classifier trained on easily synthesizable molecules (from the ZINC database) vs. hard-to-synthesize molecules. Positive scores indicate likely synthesizability.
- Default threshold: 0—inf (no upper bound)
RA Score (Retrosynthetic Accessibility Score)
- Range: 0—1 (higher is better)
- Method: XGBoost model trained on ECFP6 count fingerprints (from MolScore RAScore)
- How it works: An XGBoost classifier trained on the output of a retrosynthesis tool. Predicts the probability that a retrosynthesis planner can find a valid route to the molecule. Faster than running actual retrosynthesis but less precise.
- Default threshold: 0.5—1
SYNC Score (3D Synthesizability Classifier)
- Range: 0—1 (higher is better)
- Method: 3D EGNN classifier from SYNC, using an RDKit ETKDG conformer for each input SMILES
- How it works: Predicts easy-vs-hard synthesizability from atom types, bonds, and 3D coordinates. The checkpoint is downloaded to
modules/sync/classifier_emb.ckptbyuv run hedgehog setup syncor automatically whensync_auto_install: true. - Default threshold: 0.5—1 through
score_filters.sync_score
SCScore (Synthetic Complexity Score)
- Range: 1—5 (lower is less complex)
- Method: Standalone numpy SCScore model trained on Reaxys reactions
- How it works: Scores synthetic complexity from Morgan fingerprints using the published SCScore neural model. The default configuration computes the score but does not filter on it unless
score_filters.sc_scorethresholds are set.
Nonpher Complexity Flag
- Range: 0 or 1 (1 means too complex)
- Method: Optional Nonpher/Molpher complexity filter
- How it works: Uses Nonpher’s molecular complexity thresholds to mark molecules that exceed hard-to-synthesize complexity limits. If
HEDGEHOG_NONPHER_PYTHONis set, HEDGEHOG runshedgehog.workers.nonpher_workerinside that external interpreter. - Missing dependency behavior: if Nonpher is unavailable,
nonpher_complexity_scoreis reported asNaNand synthesis continues. - Auto-install behavior: with
--auto-install/HEDGEHOG_AUTO_INSTALL=1, HEDGEHOG first attempts an isolated uv-only bootstrap in$HEDGEHOG_OPTIONAL_ENV_ROOT/nonpher(or.venv-nonpher-worker) using pinnednumpy<2,rdkit-pypi,nonpher(git), andmolpher-lib(git). - Known blocker behavior: if uv-only bootstrap cannot build/link
molpher-lib(for examplecannot find -lmolpher) or hits other native dependency blockers, HEDGEHOG logs the exact blocker and returnsNaNfor Nonpher scores. - Manual override: you can still point
HEDGEHOG_NONPHER_PYTHONto any validated isolated interpreter (for example a prebuilt shared hybrid env) and HEDGEHOG will use it via the external worker. - Validation helper: run
uv run hedgehog setup nonpher-check(or pass--pythonto probe an isolated environment).
Example isolated Linux path (keeps main uv env unchanged):
export HEDGEHOG_OPTIONAL_ENV_ROOT=~/work/hedgehog_optional_envs
mkdir -p "$HEDGEHOG_OPTIONAL_ENV_ROOT"
# uv-only attempt happens automatically with --auto-install
uv run hedgehog setup nonpher-check --python "$HEDGEHOG_OPTIONAL_ENV_ROOT/nonpher/bin/python"
# fallback when uv-only fails with native linker blockers
uv run hedgehog setup nonpher-check --python /mnt/ligandpro/shared_storage/data/nikolenko/hedgehog_optional_envs/nonpher-hybrid-py38-v2/bin/pythonFSScore (Focused Synthesizability Score)
- Range: model-dependent raw score
- Method: Optional external FSScore model environment
- How it works: HEDGEHOG writes a temporary SMILES CSV and runs
hedgehog.workers.fsscore_worker, which delegates scoring to an isolated Python interpreter (HEDGEHOG_FSSCORE_PYTHON) viapython -m fsscore.score. - Model path resolution: set
HEDGEHOG_FSSCORE_MODEL_PATHdirectly, or setHEDGEHOG_FSSCORE_REPO_PATHand HEDGEHOG resolvesmodels/pretrain_graph_GGLGGL_ep242_best_valloss.ckpt. - Auto-install behavior: with
--auto-install/HEDGEHOG_AUTO_INSTALL=1, if Python/model settings are missing and no explicitfsscore_commandis provided, HEDGEHOG bootstraps an isolateduvruntime in$HEDGEHOG_OPTIONAL_ENV_ROOT/fsscore(or.venv-fsscore-worker) viaensure_fsscore_runtimeand wires runtime paths automatically. - Missing configuration behavior: if FSScore Python/model is not configured,
fs_scoreis emitted asNaNwith a clear warning and synthesis continues. - Setup helper: run
uv run hedgehog setup fsscore --yesto clone the upstream FSScore checkout intomodules/fsscore.
GASA
- Range: adapter-dependent score or probability
- Method: Optional local command, executable, or local HTTP API adapter
- How it works: GASA supports three local adapters:
gasa.command/HEDGEHOG_GASA_COMMAND,gasa.executable/HEDGEHOG_GASA_EXECUTABLE, andgasa.api_url/HEDGEHOG_GASA_API_URL(loopback URLs only). - Auto-install behavior: with
--auto-install/HEDGEHOG_AUTO_INSTALL=1, if no backend is configured, HEDGEHOG bootstraps an isolateduvruntime in$HEDGEHOG_OPTIONAL_ENV_ROOT/gasa(or.venv-gasa-worker) and injects a localhedgehog.workers.gasa_workercommand automatically. - Missing backend behavior: if auto-setup is unavailable and no backend is configured, HEDGEHOG logs a clear warning and returns
NaNforgasa_score. - Portability requirement: set
HEDGEHOG_OPTIONAL_ENV_ROOTto a writable host-local path (for example~/work/hedgehog_optional_envs), while keeping run outputs in shared storage.
AiZynthFinder Retrosynthesis
When run_retrosynthesis: true, the pipeline runs AiZynthFinder to search for actual retrosynthetic routes:
- Input SMILES are written to
input_smiles.smi - AiZynthFinder performs tree search using its neural expansion policy
- Routes are analyzed for feasibility (are all starting materials commercially available?)
- Results are saved to
retrosynthesis_results.json
If filter_solved_only: true, only molecules for which AiZynthFinder found at least one valid route are kept. This is the strictest synthesis filter but also the most computationally expensive.
Setting run_retrosynthesis: false skips AiZynthFinder entirely and only computes the enabled scores above, which is significantly faster.
Configuration
Full configuration in config_synthesis.yml:
run: true
n_jobs: -1 # Workers for scoring + AiZynthFinder (--nproc)
enabled_scores:
- sa
- syba
- rascore
- sync
- scscore
- nonpher
- fsscore
- gasa
run_retrosynthesis: true # Run AiZynthFinder route search
filter_solved_only: true # Keep only molecules with found routes
# Legacy score thresholds
sa_score_min: 1
sa_score_max: 4.5
syba_score_min: 0
syba_score_max: inf
ra_score_min: 0.5
ra_score_max: 1
sync_auto_install: true
sync_device: cpu
sync_conformer_seed: 61453
# Optional FSScore isolated worker configuration.
# fsscore_python: /abs/path/to/fsscore-env/bin/python
# fsscore_model_path: /abs/path/to/pretrain_graph_GGLGGL_ep242_best_valloss.ckpt
# fsscore_repo_path: /abs/path/to/fsscore
# Optional score filters for new or experimental scorers.
score_filters:
sync_score:
min: 0.5
max: 1
sc_score:
min:
max:
nonpher_complexity_score:
min:
max:
fs_score:
min:
max:
gasa_score:
min:
max:
gasa:
command:
executable:
api_url:
timeout_seconds: 30Filtering Logic
A molecule passes the synthesis stage if all of the following hold:
- SA Score is within
[sa_score_min, sa_score_max] - SYBA Score is within
[syba_score_min, syba_score_max] - RA Score is within
[ra_score_min, ra_score_max] - Any enabled
score_filtersthresholds are satisfied, includingscore_filters.sync_scorewhensyncis enabled - If
filter_solved_only: trueandrun_retrosynthesis: true: AiZynthFinder found at least one valid route
Output Files
| File | Description |
|---|---|
synthesis_scores.csv | Enabled synthesis scores for all input molecules |
synthesis_extended.csv | Scores combined with retrosynthesis results (when retrosynthesis is enabled) |
filtered_molecules.csv | Molecules passing all synthesis filters |
input_smiles.smi | SMILES input file generated for AiZynthFinder |
retrosynthesis_results.json | Raw AiZynthFinder output with route trees |
Usage
# Run synthesis as part of the full pipeline
uv run hedgehog
# Run synthesis stage only
uv run hedgehog --stage synthesis
# Short alias
uv run hedge --stage synthesisTo run a fast scores-only pass (no retrosynthesis), set run_retrosynthesis: false in the config. This is useful for quick screening when AiZynthFinder is not installed or when you want rapid turnaround.