Skip to Content
AdvancedTroubleshooting

Troubleshooting

Common problems encountered when running HEDGEHOG, with explanations and solutions.

missing required 'smiles' column

Symptom: input loading fails with an error that the file is missing the required smiles column.

Cause: the input was parsed as CSV/TSV but does not contain a smiles header, or the file is not in the expected molecule-table format.

Fix: use CSV/TSV with a smiles column:

smiles,model_name CCO,demo CCN,demo

For headerless .smi files, keep one SMILES per line and use an optional second token for model_name.

GNINA/SMINA Binary Not Found

Symptom: docking fails before producing gnina_out.sdf or smina_out.sdf.

Cause: docking is enabled but the selected binary cannot be resolved from PATH or from the explicit bin value in config_docking.yml.

Fix:

  1. Install the binary and make it available on PATH, or set an explicit path:
gnina_config: bin: /path/to/gnina
  1. For descriptor/filter-only runs, disable docking or run only the safe stages:
uv run hedgehog --stage descriptors --stage struct_filters --force-new

No Docking Output Found

Symptom: logs report that docking finished but no results were detected.

Check first:

  1. selected docking tool in config_docking.yml
  2. receptor and autobox/reference ligand paths
  3. generated scripts under stages/05_docking/_workdir/
  4. binary path and executable permissions
  5. auto_run setting

The expected final artifacts are tool-specific SDF files such as stages/05_docking/gnina/gnina_out.sdf or stages/05_docking/smina/smina_out.sdf.

Docking Script Path Issues

Docking scripts generated by HEDGEHOG execute from a _workdir/ subdirectory inside the docking stage output, not from the stage directory itself. This causes path resolution issues if relative paths are used.

Three specific path bugs to watch for:

  1. Config reference — the docking config file lives in the pipeline’s root working directory. Use an absolute path or config_file.relative_to(ligands_dir) instead of just the file name.
  2. External prep output path argument — the output SDF path must be absolute, otherwise it resolves relative to _workdir/, creating nested _workdir/_workdir/ paths.
  3. External prep input path argument — same issue as above; the input CSV path must be absolute.

Symptoms: docking jobs fail with FileNotFoundError, or output files appear in unexpected nested directories.

Fix: ensure all paths passed to docking scripts are absolute:

# Wrong: relative path breaks when cwd is _workdir/ script_args = [str(output_sdf)] # Correct: always resolve to absolute script_args = [str(output_sdf.resolve())]

Conda / Mamba Detection (GNINA)

If you configure GNINA to run inside an existing conda environment, HEDGEHOG may need to locate conda.sh to activate that environment before launching GNINA. When conda_sh is not provided in your docking configuration, HEDGEHOG auto-detects common conda installs by searching for these directory names under the user’s home:

  • miniforge
  • miniconda3
  • mambaforge
  • anaconda3

Symptom: docking (GNINA) stage fails with an error like conda not found or CondaError, or GNINA cannot find required shared libraries after activation.

Fix: ensure one of the supported conda distributions is installed and its bin/ directory is on your PATH, or set conda_sh explicitly to your .../etc/profile.d/conda.sh path.

# Verify conda is accessible conda --version # If using miniforge, conda.sh is typically here: ls ~/miniforge/etc/profile.d/conda.sh

Autobox Coordinate Frame Mismatch

When using autobox docking (where the search box is derived from a reference ligand SDF), the reference ligand must be in the same coordinate frame as the receptor PDB file.

Symptom: docking scores are unreasonably poor, or all poses are placed far from the binding site.

Common cause: the reference ligand was extracted from an apo (ligand-free) crystal structure, but the receptor PDB comes from a holo (ligand-bound) structure with a different coordinate frame. Even small translations or rotations between crystal structures cause the autobox to miss the binding pocket entirely.

Fix: superimpose the apo and holo structures before extracting the reference ligand, or use a reference ligand from the same PDB file as the receptor:

from Bio.PDB import PDBParser, Superimposer parser = PDBParser(QUIET=True) ref_struct = parser.get_structure("holo", "receptor_holo.pdb") mobile_struct = parser.get_structure("apo", "receptor_apo.pdb") # Superimpose using CA atoms, then transform the reference ligand sup = Superimposer() sup.set_atoms(ref_atoms, mobile_atoms) sup.apply(mobile_struct.get_atoms())

External Tool Licensing

The optional ligand preparation and protein preparation steps may use proprietary third-party tools. These often require a valid vendor license.

Symptom: pipeline fails at input preprocessing with license checkout failures or vendor-tool initialization errors.

Fix:

  1. Set the environment variable(s) required by your tool vendor to point to the installation directory:
export TOOL_HOME=/opt/proprietary_tools/suite2024-1
  1. Verify the license server is accessible:
$TOOL_HOME/licadmin STAT
  1. If you do not have a valid license, remove or clear the ligand_preparation_tool and protein_preparation_tool fields in config.yml. HEDGEHOG will skip external preprocessing and rely on the Mol Prep stage (Datamol standardization), which does not require a license:
# config.yml — disable external preparation tools ligand_preparation_tool: protein_preparation_tool:

Memory Issues with Large SDF Files

Docking stages load and process SDF files that can grow very large when docking thousands of molecules. This may cause out-of-memory errors, especially on systems with limited RAM.

Symptoms: the process is killed by the OS (Killed or OOM), or you see MemoryError in the log.

Mitigations:

  1. Reduce input size — use the sample_size parameter in config.yml to cap the number of molecules entering the pipeline:
sample_size: 500 # Process at most 500 molecules
  1. Split input files — divide your input CSV into smaller batches and run the pipeline on each batch separately.

  2. Reduce parallelism — lower n_jobs to reduce peak memory from concurrent docking processes:

n_jobs: 8 # Default is often set to all cores
  1. Monitor memory — watch system memory during the docking stage:
# In a separate terminal watch -n 5 free -h

AiZynthFinder Installation

The retrosynthesis stage requires AiZynthFinder. Install it with the built-in setup command.

Symptom: synthesis stage fails with ModuleNotFoundError: aizynthfinder or the retrosynthesis subprocess exits immediately.

Fix:

  1. Run setup from the project root:
uv run hedgehog setup aizynthfinder

This command installs the optional retrosynthesis extra into the project environment and downloads the required public data (model files and templates) into modules/aizynthfinder/.

  1. Verify the installation:
# Check that public data was downloaded ls modules/aizynthfinder/public/ # Check that logging config exists ls modules/aizynthfinder/aizynthfinder/data/
  1. If setup fails during dependency sync, verify your local uv installation, Python version compatibility (AiZynthFinder currently supports Python 3.10-3.12), and outbound package-index access.

Shepherd-Score on Python 3.13

Symptom: installing Shepherd dependencies fails on Python 3.13 with wheel or open3d errors.

open3d (required by shepherd-score) does not publish wheels for some Python ABIs (notably cp313), so installing Shepherd dependencies in the main environment can fail.

HEDGEHOG base install is now decoupled from Shepherd/legacy PoseCheck dependencies. For Shepherd, use an isolated worker environment:

uv run hedgehog setup shepherd-worker --yes

If Shepherd backend is unavailable at runtime, docking filters soft-skip Shepherd and log a warning with setup instructions.

TUI Port Conflicts

The TUI (Text User Interface) uses a JSON-RPC protocol over stdio to communicate between the Node.js frontend and the Python backend. The backend is launched as a child process — it does not bind to a network port by default.

However, the Node.js development server (used during TUI development) binds to a local port. If that port is already in use, the TUI will fail to start.

Symptom: Error: listen EADDRINUSE when launching the TUI in development mode.

Fix:

  1. Find and kill the process using the port:
# Find what is using port 3000 lsof -i :3000 # Kill the process kill <PID>
  1. For production use, launch the TUI via the CLI command, which uses the built bundle and stdio communication without binding any port:
uv run hedgehog tui
  1. If the TUI has not been built yet, the CLI will automatically run npm install && npm run build inside the tui/ directory before launching.

TUI Cancellation Is Not Immediate

Symptom: pressing c in the TUI requests cancellation, but an external docking or synthesis command continues for a while.

Cause: cancellation is cooperative. The pipeline stops at safe checkpoints, and long-running external subprocesses may not terminate immediately.

Fix: wait for the current stage checkpoint and inspect the run log. If a separate external process must be stopped manually, identify it from the stage work directory or system process list before killing it.

Environment-Specific Docking Config

The default docking configuration file (config_docking.yml) ships with repository-relative example receptor and reference-ligand paths. Production runs should replace them with target-specific files.

Symptom: docking stage fails immediately with FileNotFoundError for receptor PDB or reference ligand files.

Fix: copy the bundled configs into your project directory, pass the copied master config with --config, and update receptor/reference ligand paths:

receptor_pdb: /path/to/your/receptor.pdb gnina_config: autobox_ligand: /path/to/your/reference_ligand.sdf

Common Log Messages

Log messageMeaningAction
No data available for structural filtersThe previous stage produced no outputCheck that descriptors stage completed and produced molecules
Synthesis finished but no output file detectedAiZynthFinder did not write resultsCheck AiZynthFinder installation and that model/public data files exist
Docking finished but no results detected in output directoriesNeither SMINA nor GNINA produced outputCheck binary paths and receptor PDB
No molecules left after synthesisAll molecules were filtered outRelax synthesis thresholds in config_synthesis.yml
Pipeline completed with failuresAt least one enabled stage did not completeReview the per-stage status in the log output
Last updated on