Skip to Content
ConfigurationOverview

Configuration Overview

Hedgehog uses a hierarchical YAML configuration system. A single main config file (config.yml) controls global settings and references separate per-stage config files. Each pipeline stage reads its own configuration independently, making it easy to tune individual stages without touching the rest.

Config File Hierarchy

config.yml (main — global settings + paths to stage configs) ├── config_mol_prep.yml (Datamol molecule standardization) ├── config_descriptors.yml (molecular descriptor calculation) ├── config_structFilters.yml (structural / medchem filters) ├── config_synthesis.yml (retrosynthesis scoring) ├── config_docking.yml (molecular docking — SMINA / GNINA / Matcha) ├── config_docking_filters.yml (post-docking pose filters) └── config_moleval.yml (generative metrics & reporting)

How Configs Are Loaded

The CLI resolves the default config path relative to the installed package:

DEFAULT_CONFIG_PATH = str(Path(__file__).resolve().parent / "configs" / "config.yml")

You can override the master config path at runtime with --config/-c:

uv run hedgehog --config path/to/config.yml

You can also override the output directory (folder_to_save) with --out/-o:

uv run hedgehog --out results/my_run

Filesystem paths in the master config are resolved in this order:

  1. absolute paths are used as-is
  2. relative to the master config file location
  3. relative to the source/package root for bundled src/hedgehog/... paths
  4. relative to the current working directory as a final fallback

For example:

# Relative to the master config file when that file lives beside configs/ config_descriptors: config_descriptors.yml # Bundled source-checkout path config_descriptors: src/hedgehog/configs/config_descriptors.yml # Absolute — used as-is config_descriptors: /opt/hedgehog/configs/config_descriptors.yml

Overriding Parameters

There are three common ways to customise the pipeline:

  1. Create your own config files and pass them via --config. Copy the default configs, edit them, then run:

    uv run hedgehog --config path/to/my_config.yml
  2. Override common paths via CLI flags. Use:

    • --mols/-m to override generated_mols_path
    • --out/-o to override folder_to_save
  3. For quick experiments in a source checkout, edit copied config files. Do not edit configs inside an installed package in site-packages; copy them into your project directory and pass the copied master config with --config.

For example, to run only descriptors and skip all other stages:

# my_config.yml generated_mols_path: input/my_molecules.csv target_mols_path: input/my_targets.csv folder_to_save: results/my_run n_jobs: 8 sample_size: 500 config_descriptors: configs/config_descriptors.yml config_mol_prep: configs/config_mol_prep.yml config_structFilters: configs/config_structFilters.yml config_synthesis: configs/config_synthesis.yml config_docking: configs/config_docking.yml config_docking_filters: configs/config_docking_filters.yml config_moleval: configs/config_moleval.yml

To use a custom configuration today, pass it explicitly with --config:

uv run hedgehog --config my_config.yml

Then in config_synthesis.yml, set run: false to skip that stage entirely. The same run flag is available in every stage config.

Config Files at a Glance

FilePurpose
config.ymlGlobal paths, parallelism, sample size, and references to all stage configs
config_mol_prep.ymlDatamol-based molecule standardization and strict filtering
config_descriptors.ymlDescriptor borders, filtering flags, plotting options
config_structFilters.ymlStructural alerts, medchem filters (Bredt, NIBR, Lilly, etc.)
config_synthesis.ymlRetrosynthesis toggle, enabled synthesis scorers, score thresholds (including optional sync, nonpher, fsscore, and gasa)
config_docking.ymlDocking tool selection (GNINA/SMINA/Matcha), receptor, autobox settings
config_docking_filters.ymlPost-docking filters: search box, pose quality, interactions, shape, conformer deviation
config_moleval.ymlGenerative evaluation metrics: diversity, scaffolds, filter pass rates

Main Config Structure

Below is the full default config.yml:

generated_mols_path: src/hedgehog/configs/examples/moses_1000.csv target_mols_path: src/hedgehog/configs/examples/target_mols.csv folder_to_save: results/run n_jobs: -1 sample_size: 10000 batch_size: 512 save_sampled_mols: true pains_file_path: src/hedgehog/vendor/moleval/metrics/wehi_pains.csv mcf_file_path: src/hedgehog/vendor/moleval/metrics/mcf.csv ligand_preparation_tool: /opt/proprietary_tools/ligand_prep/bin/ligand_prep protein_preparation_tool: /opt/proprietary_tools/protein_prep/bin/protein_prep config_mol_prep: src/hedgehog/configs/config_mol_prep.yml config_descriptors: src/hedgehog/configs/config_descriptors.yml config_structFilters: src/hedgehog/configs/config_structFilters.yml config_synthesis: src/hedgehog/configs/config_synthesis.yml config_docking: src/hedgehog/configs/config_docking.yml config_docking_filters: src/hedgehog/configs/config_docking_filters.yml config_moleval: src/hedgehog/configs/config_moleval.yml

n_jobs: -1 means use all available cores. For laptops, shared servers, CI, or notebooks, set an explicit smaller value such as 4 or 8.

For a complete description of every parameter, see the Parameter Reference.

Last updated on