Pipeline Architecture
Product and cybersecurity architecture understanding package generated from Markdown-derived requirements.
RFQX Pipeline Architecture
What each script does, what is deterministic vs heuristic vs AI-assisted, what needs human review, and what happens when new customer documents are added.
Architecture & Script Responsibilities
RFQX Pipeline Architecture
How the RFQX repository works, what each script does, what is deterministic vs heuristic vs AI-assisted, and what happens when input documents change.
1. What RFQX Does
RFQX turns a customer RFQ document set into a supplier-side working environment: customer requirement review, supplier proposals, open points, estimation impact, an initial cybersecurity concept, an initial system/security design, and derived supplier system requirements — all with traceability back to Markdown-derived sources.
2. Input Folders
| Folder | Purpose |
|---|---|
customer-input/pdf/ | Source customer PDFs (source-only; never analysed downstream). |
customer-input/markdown-manual/ | Manually corrected Markdown. |
customer-input/customer-feedback/ | Customer feedback CSVs (Phase 4 loop). |
converted/markdown*/ | Markdown produced from the PDFs. |
3. Markdown-First Rule
All analysis runs on Markdown, never on the PDFs directly. PDFs are converted to Markdown first; every requirement keeps its source Markdown file, section path and page reference. This keeps extraction auditable and stable.
4. OCR Policy
OCR is disabled (ocr_enabled: false in config.yaml). Pages with no extractable text are flagged for manual review rather than OCR'd. The built-in converter never performs OCR.
5. Main Scripts and Their Responsibility
See section 10 for the full table.
6. Generated Outputs
Requirements register, supplier proposals, open points, estimation impact, supplier system requirements + coverage, initial cybersecurity concept, system/security design, workflow state, the HTML site, and the evidence package.
7. What Happens When New Documents Are Added
See section 13 (re-run behaviour). In short: convert → quality-check → extract → regenerate register, proposals, open points, SSRs, concept → regenerate site → validate. Customer decisions and feedback are never invented.
Per-PDF Document Intelligence Re-run Behavior
When a new PDF is added and the pipeline is re-run:
- a new document intelligence record is created
- a new per-PDF page is created
- requirements from that PDF are grouped under that page
- SSRs derived from that PDF are linked
- open points caused by that PDF are linked
- design/concept/estimation impacts are recalculated
- overall dashboards are also updated
Per-PDF Content / Section Filtering
- Section filters are generated from converted Markdown, extracted requirement
- Parent section filters include child requirements because each requirement row
- New PDFs automatically get section filters when numbered headings or
- If headings are not reliable, the document page falls back to page-based
- PDFs are not analyzed directly downstream; they remain source evidence only.
metadata, section references, and page references.
stores the selected section plus all parent section IDs.
table-of-contents-like entries are detected.
filtering.
8. What Is Deterministic / Rule-Based
- PDF→Markdown conversion and page/section tagging.
- Requirement extraction structure and IDs.
- Requirement review register field assembly.
- Supplier system requirement clustering and coverage maths.
- Disposition assignment, traceability matrices, counts, coverage %.
- Site generation and validation.
These produce identical output for identical input.
9. What Is Heuristic
- Supplier position defaults (Accept / Accept-with-Assumption / Partial / …) from
- Security-relevance inference and capability/feature/interface mapping.
- SSR category classification and clustering granularity.
- Open-point topic matching and estimation impact levels.
keyword + mapping rules.
Heuristics are conservative and explainable, but they approximate engineering judgment and should be reviewed.
10. What Requires AI/LLM Judgment
Currently no script calls an AI/LLM API (see analysis/ai_assistance_gap_report.md). The outputs that would most benefit from AI assistance — nuanced requirement interpretation, bespoke supplier proposals, sharp clarification questions, and concept conclusions — are today produced by rules/heuristics or human review. An optional, controlled AI stage is designed in docs/ai_analysis_stage_design.md.
11. What Requires Human Review
- Low-confidence / human-review-flagged requirements.
- Every supplier position before it is sent to the customer.
- Responsibility (CIA/RASIC) decisions and open-point answers.
- SSR statements before customer alignment.
- The cybersecurity concept review with the Cybersecurity Manager and leads.
12. Known Limitations
- Heuristic defaults are not a substitute for engineering review.
- No AI stage is implemented; conclusions are only as good as the rules/heuristics.
- Duplicate customer requirement IDs in the source are flagged Duplicate/Merged.
- Screenshots require Playwright (not installed); export records the blocker.
- Git remote is not configured; commits are local only.
13. How to Re-run the Pipeline
python scripts/run_full_analysis.py # convert + extract (when new PDFs)
python scripts/generate_requirement_review_register.py # register, proposals, open points, concept, SSRs
python scripts/generate_supplier_system_requirements.py # SSR derivation + coverage + matrix
python scripts/generate_document_intelligence.py # per-PDF records, traceability and impact data
python scripts/ingest_customer_feedback.py # apply any customer feedback
python scripts/generate_expert_synthesis.py # features/interfaces/capabilities/diagrams
python scripts/generate_html_site.py # build the site
python scripts/check_html_site.py # validate
python scripts/export_site_evidence.py # evidence package
python scripts/check_git_status.py # git status report
14. How to Validate the Result
python scripts/check_html_site.py must print "HTML site validation passed." It fails on missing pages/artifacts, OCR enabled, downstream PDF analysis, weak labels, insufficient SSR coverage, or unmapped active requirements without a disposition. Warnings (heuristic proposals, no AI stage, blocked SSRs, feedback not ingested) are expected and listed.
15. AI Assistance Gap Report
The current pipeline uses AI/LLM API: no. See analysis/ai_assistance_gap_report.md for the scan result, the current rule/heuristic outputs, and the outputs that would benefit from controlled AI assistance.
16. Optional AI Stage Design
The optional AI stage is design-only. See docs/ai_analysis_stage_design.md, prompts/, and schemas/. It reads Markdown-derived requirements only, never PDFs directly, returns schema-validated JSON, separates evidence from inference, marks confidence, and stays opt-in.
Script Responsibility Table
| Script | Input | Output | Deterministic / Heuristic / AI-Assisted | Human Review Needed |
|---|---|---|---|---|
| convert_pdf_to_markdown.py | customer PDFs | Markdown | Deterministic (no OCR) | Yes (partial pages) |
| check_converter_availability.py | environment | converter report | Deterministic | No |
| check_markdown_quality.py | Markdown | quality report | Deterministic + heuristic thresholds | Yes (low quality) |
| ingest_markdown.py | Markdown | chunks | Deterministic | No |
| extract_requirements.py | Markdown | extracted_requirements.json | Heuristic (pattern-based) | Yes |
| generate_requirement_review_register.py | extracted reqs + mappings | review register, proposals, open points, estimation, concept, design, workflow state | Heuristic defaults + deterministic assembly | Yes (positions/proposals) |
| generate_supplier_system_requirements.py | review register + models + open points | SSRs, customer→supplier matrix, coverage | Heuristic classification + deterministic clustering/maths | Yes (SSR statements) |
| generate_document_intelligence.py | manifest + review register + SSRs + open points + estimation | per-PDF document intelligence records, traceability matrix, diagrams, and page data | Deterministic + heuristic classification/scoring | Yes (document conclusions) |
| ingest_customer_feedback.py | feedback CSVs | status deltas, updated register | Deterministic | Yes (decisions are customer-owned) |
| generate_expert_synthesis.py | Markdown-derived data | features/interfaces/capabilities/diagrams | Heuristic | Yes |
| generate_html_site.py | all generated data | HTML site | Deterministic | No |
| check_html_site.py | site + data | validation result | Deterministic | No |
| export_site_evidence.py | site | screenshots + zip | Deterministic (Playwright optional) | No |
| check_git_status.py | git repo | git status report | Deterministic | No |
None of the above currently call an AI/LLM API.
AI Assistance Gap
Show AI assistance gap report
AI / LLM Assistance Gap Report
Current pipeline uses AI/LLM API: No
A full scan of scripts/ finds no call to any AI/LLM API (no Anthropic/OpenAI SDK, no requests/HTTP call to a model endpoint, no local model invocation). All analysis is produced by deterministic code and conservative heuristics.
Which outputs are rule-based / heuristic today
| Output | Method |
|---|---|
| Requirement extraction | Heuristic (pattern/keyword) |
| Supplier position defaults | Heuristic rules |
| Engineering expectation / supplier proposal text | Templated heuristics |
| Security relevance + capability/feature/interface mapping | Heuristic keyword rules |
| Open points | Curated topic matching |
| Estimation impact levels | Heuristic rules |
| Supplier system requirement derivation | Heuristic classification + deterministic clustering |
| Cybersecurity concept conclusions | Templated from aggregated data |
Which outputs should be AI-assisted for better quality
- Requirement interpretation (intent, ambiguity, implicit obligations).
- Supplier proposal generation (bespoke, requirement-specific wording).
- Customer clarification question generation (sharper, context-aware).
- Supplier system requirement derivation (better clustering + statements).
- Cybersecurity concept conclusion generation (assumptions, risks, gaps).
What structured prompts / schema are needed
Defined in docs/ai_analysis_stage_design.md, with prompts in prompts/ and JSON schemas in schemas/. The AI stage must return structured JSON validated against those schemas — never free text.
Recommendation
AI/LLM-assisted analysis should be added as an optional, controlled stage for: requirement interpretation, supplier proposal generation, customer clarification generation, system requirement derivation, and cybersecurity concept conclusions. It must read Markdown-derived requirements only, never analyse PDFs directly, return schema-validated JSON, separate explicit evidence from inference, and mark confidence. Until implemented, conclusions are only as good as the current rules/heuristics plus human review.
Optional AI Stage Design
Show optional AI stage design
Optional AI Analysis Stage — Design
Design only. No API calls are implemented. This defines an optional, controlled AI/LLM stage that augments the rule/heuristic pipeline without replacing its deterministic, auditable core.
Position in the Pipeline
Markdown-derived requirements
│
▼
(optional) AI Analysis Stage ──► structured JSON (schema-validated)
│ │
▼ ▼
merge into review register / SSRs / concept ──► site + validation
The AI stage runs after requirement extraction and before site generation. Its JSON output is merged as a suggestion layer; deterministic assembly, traceability and validation are unchanged. If the AI stage is disabled, the rule/heuristic defaults are used (current behaviour).
Hard Rules
- Read Markdown-derived requirements only; never analyze PDFs directly.
- Return structured JSON validated against the schemas in
schemas/; never free text. - Separate explicit evidence from inference (every field tags
evidencevsinferred). - Mark confidence (
high/medium/low) on every judgment. - Preserve source traceability (
source_markdown,page_reference) on every item. - Output is a suggestion: a human reviews before it becomes a committed position.
Agents and Outputs
| Agent | Prompt | Schema | Produces |
|---|---|---|---|
| Requirement Review | prompts/requirement_review_agent.md | schemas/requirement_review.schema.json | supplier proposal + position + open points per requirement |
| System Requirement Derivation | prompts/system_requirement_derivation_agent.md | schemas/system_requirement_derivation.schema.json | SSR candidates with many-to-many traceability |
| Cybersecurity Concept | prompts/cybersecurity_concept_agent.md | schemas/cybersecurity_concept.schema.json | concept conclusions, assumptions, risks, gaps |
Controls
- Batching: process requirements in capped batches; deterministic IDs.
- Determinism: temperature 0; cache by input hash so re-runs are stable.
- Validation: reject any non-conforming JSON; fall back to heuristic defaults.
- Cost/scope guard: AI stage is opt-in via config flag (default off).
- No silent overwrite: AI output lands in a
*_ai_suggestedfield; merge is explicit. - Supplier system requirement control: generate supplier system requirement
candidates only from Markdown-derived customer requirements, with explicit evidence, inference, confidence, and open points.
Inputs / Outputs Summary
- Input:
requirements/requirement_review_register.json(or extracted requirements), relevant Markdown excerpts. - Output: JSON files under an
ai_suggestions/folder, schema-validated, then merged on human approval.