HTML Report
After each pipeline run, HEDGEHOG generates a self-contained interactive HTML report at:
results/run_N/report.htmlThe report uses Plotly.js for interactive charts and includes a model filter dropdown that lets you view metrics for individual generative models or compare all models side-by-side.
Each run also writes a companion Jupyter notebook, stage_filter_audit.ipynb, for deeper stage-by-stage inspection. The notebook uses mols2grid to review molecules that passed or dropped at each stage and to compare them against descriptor, synthesis, or docking-filter thresholds.
Report Sections
Pipeline Flow (Sankey Diagram)
The top of the report displays a Sankey diagram showing how molecules flow through the pipeline. Each node represents a pipeline stage, and the width of each link is proportional to the number of molecules that survive that transition.
- Purple links represent molecules that pass to the next stage.
- Gray “Lost” nodes branch off at each transition, showing how many molecules were filtered out.
- Hover over any link or node to see exact counts and percentages relative to the initial set.
A classic funnel chart is also included, showing the absolute molecule count at each stage with percentage-of-initial annotations.
Executive Summary
Four summary cards display key pipeline statistics:
| Card | Description |
|---|---|
| Initial Molecules | Total molecules entering the pipeline |
| Final Molecules | Molecules surviving all stages |
| Retention Rate | Percentage of molecules retained end-to-end |
| Stages Completed | Number of pipeline stages that ran successfully |
A stage status table shows each stage as COMPLETED, FAILED, or DISABLED.
Model Comparison
When the input contains molecules from multiple generative models, the report includes:
- Grouped bar chart comparing initial vs. final molecule counts per model with retention rates.
- Stacked bar chart showing where molecules were lost (by stage) for each model.
- A model dropdown at the top of the report to filter all sections by a single model or compare all.
Generator Reality Assessment
The reporting layer adds a Generator Reality Assessment scorecard in RUN_INFO.md and an optional chart in the HTML report when scoring data is available.
It summarizes each model with:
Generator Reality Score(0.0–100.0) used for ranking how well the model’s initial generated set survives the pipeline gates.Final Candidate Pool Quality(0.0–100.0), a secondary survivor-pool score for the molecules that already passed the pipeline.Grade(Excellent,Strong,Moderate,Weak) for quick triage.Confidence(High,Medium,Low) for reliability assessment.Main Bottlenecklist to show the weakest components.
The default component weights are:
| Component | Default Weight |
|---|---|
yield | 0.30 |
physchem | 0.15 |
structural | 0.25 |
synthesis | 0.10 |
docking_pose | 0.15 |
diversity | 0.05 |
Each component is built from stage outputs already produced by the pipeline (descriptors, structural filters, docking, synthesis, and MolEval diversity metrics). In short:
yield: final retention againsttarget_final_retentionphyschem: all-pass rate across early descriptor gatesstructural: structural stage survival plus the weakest structural gatesynthesis: synthesis solvability, SA/RA/SYBA/search-time profiledocking_pose: docking scores and pose quality pass ratesdiversity: MolEval diversity split across internal, scaffold, and sphere-exclusion metrics
How the score is assembled
The score is collected per model_name. If a CSV does not have model_name, HEDGEHOG treats it as an aggregate __all__ model.
The report generator reads these sources:
| Component | Source files | Evidence collected |
|---|---|---|
yield | input/sampled_molecules.csv, output/final_molecules.csv, root final_molecules.csv, final descriptor or docking-filter fallback CSVs | Initial count, final count, final retention rate |
physchem | stages/01_descriptors_initial/filtered/pass_flags.csv, stages/01_descriptors_initial/metrics/descriptors_all.csv, with final descriptor files only as fallback | Descriptor all-pass rate, mean flag pass rate for evidence, worst flag, QED, MolWt, LogP, TPSA, Fsp3 summaries |
structural | stages/03_structural_filters_post/filtered_molecules.csv, stages/03_structural_filters_post/failed_molecules.csv | Stage pass rate, weakest structural flag, mean flag pass rate for evidence, filtered/failed counts |
synthesis | stages/04_synthesis/synthesis_extended.csv, synthesis_scores.csv, with filtered_molecules.csv only as fallback | Solve rate, median SA, median RA, median SYBA, median search time across molecules evaluated by synthesis |
docking_pose | stages/06_docking_filters/metrics.csv, filtered_poses.csv, with final molecule files only as fallback | Median affinity, median CNNscore, median CNNaffinity, pose pass rates across docking-filter input poses |
diversity | Already computed moleval.by_stage.Input in report_data.json, with DockingFilters only as fallback | IntDiv1, IntDiv2, ScaffDiv, ScaffUniqueness, SEDiv for the model input set |
Component scores are normalized to 0..100:
yieldisfinal_retention_rate / target_final_retention, clipped to0..1.physchemis the fraction of early descriptor rows that pass every descriptor flag simultaneously.structuralis0.80 * structural_stage_pass_rate + 0.20 * worst_filter_pass_rateby default.synthesiscombines solve rate, normalized SA, normalized RA, sigmoid-normalized SYBA, and normalized search time across synthesis-evaluated molecules, not only solved survivors.docking_posecombines normalized affinity, CNNscore, CNNaffinity, and pose pass rate across docking-filter input poses, not only final accepted molecules.diversitycombines IntDiv1, IntDiv2, ScaffDiv, ScaffUniqueness, and SEDiv from the model input set when available.
The final model score is:
overall = sum(component_weight * component_score) / sum(available_component_weights)Missing components are not scored as zero. They are marked available: false, excluded from the denominator, and recorded as warnings where appropriate. This keeps report generation robust for partial runs, but Confidence drops when too few components or final molecules are available.
The generator score also supports hard caps for critical funnel failures. By default, a structural stage pass rate below 0.20 caps the score at 60, a descriptor all-pass rate below 0.50 caps it at 70, and a final retention rate below 0.05 caps it at 70. These caps prevent a model from looking strong when an AND-gate stage rejects most of its generated molecules.
Final Candidate Pool Quality keeps the older survivor-pool view: yield uses final-count saturation, physchem uses the mean descriptor flag pass rate, structural uses the mean structural flag pass rate, and synthesis/docking/diversity use the same component formulas. Use it to inspect the final pool, not to judge the generator itself.
Important caveat: this is an explainable ranking scorecard, not a calibrated probability of biological success. Use it to compare models inside the same experiment and then validate best candidates with the underlying stage-level reports.
If a component is missing (for example, docking or synthesis is not available for that run), its value is dropped from the denominator and the remaining weights are renormalized automatically. This changes the effective contribution of surviving components and may reduce interpretability across runs with different enabled stages.
If evidence is sparse (final molecules is low and/or several components are missing), Confidence drops from High to Medium or Low.
To configure or disable this section, use:
config_weighted_score: src/hedgehog/configs/config_weighted_score.ymlSet run: false in that file to remove the generator reality assessment from the report.
You can tune docking/synthesis behavior in this config to move rankings toward affinity-rich or synthetic tractability-focused models:
- Docking thresholds are configured under
docking(bad_affinity/good_affinity,bad_cnnscore/good_cnnscore,bad_cnnaffinity/good_cnnaffinity). - Synthesis thresholds are configured under
synthesis(sa_*,ra_*,syba_*,target_search_time_sec).
Structural Filters
This section covers the structural filtering stages (pre-descriptor and post-descriptor filters):
- Stacked bar chart of molecules passed vs. banned per filter, grouped by model.
- Heatmap of banned ratios (fraction of molecules failing each filter) by model.
- Top failure reasons bar chart showing the most common structural alerts.
- A summary table of pass/fail counts per filter.
Synthesis Analysis
The synthesis section presents scores from SA Score, SYBA Score, and RA Score computations:
- Histograms for each score distribution (SA, SYBA, RA).
- SA vs. SYBA scatter plot colored by model, showing the trade-off between synthetic accessibility and synthesizability.
- Retrosynthesis route score histogram and step count histogram from AiZynthFinder results.
- Pie chart showing solved vs. unsolved retrosynthesis targets.
Docking
The docking section reports binding affinity results from GNINA, SMINA, and/or Matcha:
- Affinity distribution histogram for each docking tool (kcal/mol scale, lower is better).
- Box plots of affinity scores grouped by model.
- Top molecules table listing the best-scoring compounds with their affinities and CNN scores (for GNINA) and Matcha affinity when Matcha outputs are present.
Docking Filters
If docking filters are enabled, this section shows:
- Per-filter pass/fail stats for each enabled filter (search box, pose quality, interactions, conformer deviation, shepherd score).
- Histograms of numeric metrics (clashes, strain energy, conformer RMSD, shape score, etc.) with threshold lines.
- Per-model breakdown of total poses, passed poses, and pass rates.
- Interaction analytics (ProLIF) when interaction reporting is enabled: top-contact residues, interaction type distribution, and residue × interaction-type heatmap.
- The aggregation mode (
allorany) and overall pass rate.
Descriptors
Two descriptor sections appear in the report — one for initial descriptors (computed early in the pipeline) and one for final descriptors (recomputed on surviving molecules):
- Violin plots of key molecular descriptors (MolWt, LogP, TPSA, QED) grouped by model.
- Box plots comparing H-bond donors and acceptors across models.
- Summary table with mean values for all computed descriptors, broken down by model.
Drug-likeness threshold lines (e.g., Lipinski’s Rule of Five boundaries) are included as reference markers in the interactive histograms.
Interpreting Comparison Histograms
When comparing generated molecules against reference sets:
- Overlapping distributions indicate that generated molecules match the reference property profile.
- Shifted distributions highlight systematic differences (e.g., generated molecules are heavier or more lipophilic than references).
- Use the model dropdown to isolate individual models and see which generator best matches the target property space.
Generative Metrics (MolEval)
This section reports intrinsic distribution quality metrics computed by the vendored MolEval library at five pipeline checkpoints. It includes:
- Line chart tracking all active metrics across stages.
- Heatmap with exact metric values (rows = metrics, columns = stages).
- Data table in
RUN_INFO.mdappended after the run.
See the MolEval Metrics page for detailed metric definitions.
Output Files
After report generation, the run directory contains:
| File | Description |
|---|---|
report.html | Self-contained interactive HTML report |
report_data.json | Raw JSON data used to build the report |
stage_filter_audit.ipynb | Jupyter notebook for molecule-level stage audit with mols2grid |
RUN_INFO.md | Markdown summary with MolEval metrics table |
stages/06_docking_filters/interaction_*.csv | Interaction reporting artifacts (events, residue summary, type summary, matrix) produced by docking filters |
stages/06_docking_filters/interaction_report_meta.json | Metadata summary for interaction reporting totals used by report visualizations |
Configuration
Report generation is triggered automatically by hedgehog after all pipeline stages complete.
The MolEval sections require a valid config_moleval path in the pipeline configuration:
config_moleval: src/hedgehog/configs/config_moleval.ymlSee MolEval Configuration for all available options.