Skip to Content
Pipeline StagesDocking Filters

Docking Filters

The docking filters stage applies a cascade of quality checks to docked poses, removing those with steric clashes, implausible geometries, or missing key interactions. It then deduplicates poses to produce a list of unique molecules for downstream processing.

Filter Cascade

Filters are applied in order from cheapest to most expensive. When the aggregation mode is all (default), the search-box filter can short-circuit evaluation: poses that fail it skip all subsequent filters.

Filter 0: Search-Box Containment

Verifies that the docked pose lies within the configured docking search box. This catches poses that drifted outside the binding site during optimization.

ParameterDefaultDescription
enabledtrueEnable/disable this filter
max_outside_fraction0.0Maximum fraction of atoms allowed outside the box (0.0 = all atoms must be inside)
short_circuittrueSkip heavy filters for failed poses (only in all mode)

The search box is resolved from the docking configuration: either explicit center/size coordinates, or computed from autobox_ligand + autobox_add (the same reference ligand used for docking).

Filter 1: Pose Quality

The shipped default backend is posebusters_fast, a fast pose-quality check for protein-ligand clashes, volume overlap, and protein distance. The legacy optional backend is posecheck, which exposes strain-energy parameters.

ParameterDefaultDescription
enabledtrueEnable/disable this filter
backendposebusters_fastBackend: posebusters_fast or legacy posecheck
clash_cutoff0.75Relative VDW distance cutoff for fast clash detection
volume_clash_cutoff0.075ShapeTverskyIndex overlap threshold for volume clash detection
max_distance5.0Maximum minimum ligand-protein distance in Angstroms
max_clashes2Legacy PoseCheck maximum allowed steric clashes
max_strain_energy50.0Legacy PoseCheck maximum ligand strain energy in kcal/mol
strain_forcefieldUFFLegacy PoseCheck force field for strain calculation
clash_tolerance0.5Legacy PoseCheck VDW overlap tolerance

Clashes indicate that the ligand overlaps with protein atoms in a physically impossible way. High strain energy means the ligand is in an energetically unfavorable conformation.

Filter 2: Interaction Analysis (ProLIF)

Uses ProLIF  to compute a protein-ligand interaction fingerprint and check for required/forbidden contacts.

ParameterDefaultDescription
enabledtrueEnable/disable this filter
min_hbonds0Minimum hydrogen bonds required
required_residues['ASP12']Residue identifiers that must have at least one interaction
forbidden_residues[]Residues that must NOT have any interaction
interaction_types[HBDonor, HBAcceptor, Hydrophobic, VdWContact]Interaction types to detect
reporting.enabledtrueGenerate interaction reporting artifacts
similarity_threshold0.0Tanimoto similarity to reference interaction fingerprint (0 = disabled)

This filter is especially useful when you know the binding mode should involve specific residues (e.g., a catalytic aspartate) or should avoid certain contacts (e.g., a cysteine that causes covalent binding).

Residue identifiers are matched against ProLIF interaction column labels. The bundled demo configuration uses ASP12. Depending on your prepared receptor and ProLIF naming, identifiers may look different, so inspect the generated interaction report before finalizing required_residues.

Filter 3: Shepherd-Score (3D Shape Similarity)

Compares the 3D molecular shape of each pose to a reference ligand using Gaussian overlap Tanimoto.

ParameterDefaultDescription
enabledfalseDisabled by default (requires reference ligand)
backendautoauto = worker -> in-process -> soft-skip, worker = worker only, inprocess = in-process only
auto_install_workertrueIf worker command is missing, try auto-installing .venv-shepherd-worker
worker_pythonnullOptional interpreter for auto-install (python3.12, python3.11, python3.10)
reference_ligandnullPath to reference ligand SDF
min_shape_score0.5Minimum shape Tanimoto score
alpha0.81Gaussian width parameter

This filter is disabled by default because it requires a known reference ligand for comparison. Enable it when you have a co-crystallized ligand or known active compound and want to ensure poses adopt a similar shape. For reproducible setup, install an isolated worker environment:

uv run hedgehog setup shepherd-worker --yes

If no Shepherd backend is available at runtime, HEDGEHOG soft-skips this filter (logs a warning and marks pass_shepherd_score=true).

Filter 4: Conformer Deviation

Checks if the docked pose is geometrically plausible by generating multiple low-energy conformers and measuring the RMSD between the docked pose and the closest conformer.

ParameterDefaultDescription
enabledtrueEnable/disable this filter
use_nvmolkittrueTry nvMolKit acceleration when available (falls back to RDKit if unavailable)
num_conformers50Number of conformers to generate
conformer_methodETKDGv3Conformer generation method (ETKDG, ETKDGv2, ETKDGv3)
max_rmsd_to_conformer3.0Maximum RMSD in Angstroms to closest conformer
random_seed42Seed for reproducible conformer generation
include_hydrogensfalseInclude hydrogens in RMSD matching
max_matches10000Cap symmetry matching complexity
early_stop_on_passtrueStop comparison once any conformer passes
optimize_conformersfalseUFF optimization of generated conformers

A high minimum RMSD indicates that the docking engine placed the ligand in a conformation that is energetically unlikely for the molecule to adopt in solution. This catches docking artifacts where the scoring function found a favorable protein-ligand interaction at the cost of internal strain. For isolated setup of optional nvMolKit dependencies:

uv run hedgehog setup nvmolkit-worker

Aggregation

The aggregation mode controls how per-filter results are combined:

aggregation: mode: "all" # "all" = pass every filter, "any" = pass at least one save_metrics: true # Save detailed per-pose metrics CSV save_failed: false # Save molecules that failed filtering
  • all (default): a pose must pass every enabled filter. This is the conservative approach for drug discovery campaigns.
  • any: a pose passes if it passes at least one filter. Useful for exploratory analysis.

Deduplication

Docking can produce multiple poses per molecule when num_modes is greater than 1 (the default config uses num_modes: 1). After filtering, the pipeline deduplicates to unique molecules:

  1. All passing poses are saved to filtered_poses.csv (full pose-level detail)
  2. Poses are sorted by minimizedAffinity (best affinity first)
  3. For each unique mol_idx, only the best-scoring pose is kept
  4. Deduplicated molecules are saved to filtered_molecules.csv

SMILES for the output are taken from the original ligands.csv (2D SMILES) rather than regenerated from 3D coordinates, which preserves the original stereochemistry encoding.

Configuration

Full configuration in config_docking_filters.yml:

run: true run_after_docking: true input_sdf: null # null = auto-detect from docking output receptor_pdb: null # null = use receptor from docking config search_box: enabled: true max_outside_fraction: 0.0 short_circuit: true pose_quality: enabled: true backend: "posebusters_fast" clash_cutoff: 0.75 volume_clash_cutoff: 0.075 max_distance: 5.0 max_clashes: 2 max_strain_energy: 50.0 strain_forcefield: "UFF" clash_tolerance: 0.5 interactions: enabled: true min_hbonds: 0 required_residues: ['ASP12'] forbidden_residues: [] interaction_types: - HBDonor - HBAcceptor - Hydrophobic - VdWContact reporting: enabled: true shepherd_score: enabled: false backend: "auto" auto_install_worker: true worker_python: null reference_ligand: null min_shape_score: 0.5 alpha: 0.81 conformer_deviation: enabled: true use_nvmolkit: true num_conformers: 50 conformer_method: "ETKDGv3" max_rmsd_to_conformer: 3.0 random_seed: 42 include_hydrogens: false max_matches: 10000 early_stop_on_pass: true optimize_conformers: false aggregation: mode: "all" save_metrics: true save_failed: false

Output Files

FileDescription
metrics.csvPer-pose filter metrics and pass/fail flags for every filter
filtered_molecules.csvUnique molecules (best pose per molecule) passing all filters
filtered_poses.csvAll passing poses with full metrics (before deduplication)
filtered_poses.sdf3D structures of all passing poses in SDF format

The pipeline-level output file output/final_molecules.csv now keeps all upstream columns and adds aggregated docking scores per molecule:

  • gnina_affinity
  • gnina_cnnscore
  • gnina_cnnaffinity
  • gnina_cnn_vs
  • smina_affinity
  • matcha_affinity

For each tool, values are taken from the best pose per molecule (minimum affinity).

Usage

# Run docking filters as part of the full pipeline uv run hedgehog # Run docking filters stage only (requires docking output to exist) uv run hedgehog --stage docking_filters # Short alias uv run hedge --stage docking_filters
Last updated on