Known Issue Archive
Harrier keeps durable issue knowledge in Git so recurring EMR/Spark failures can move from ad hoc investigation to reviewed classifier coverage.
The archive is intentionally split into two layers:
- Human-readable issue records in
knowledgebase/issues/KB-####-slug.md. - A machine-readable index in
knowledgebase/issues/index.json.
This keeps fixes reviewable in pull requests while giving future MCP tools a stable retrieval surface.
When To Archive
Create a known issue record when:
- Harrier returns
UNKNOWNbut the report contains useful evidence. - A production or demo issue recurs with the same log, metric, DB, IAM, or deployment signal.
- A new classifier rule is added and needs a durable explanation, validation history, and rollback guidance.
Do not archive secrets, full log files, customer data, tokens, passwords, or unredacted SQL values. Store short signatures and links to redacted reports instead.
Workflow
- Run Harrier and save the validation or investigation report.
- Create a draft KB record:
scripts/archive_known_issue.py create \
--title "Executor OOM after skewed join" \
--source-report ../harrier-emr-demo-lab/.harrier-demo/validation/executor_oom-20260529T004326Z.json \
--signal "Container killed by YARN for exceeding memory limits" \
--tag spark \
--tag memory
- Fill in the markdown sections that require human judgment.
- If the issue has a stable signal, promote it into the deterministic rule system:
- Add or reuse a
FindingCategory. - Add
PatternRule/ClassificationRule. - Add a recommendation factory.
- Add MCP unit tests.
- Add a demo scenario and expected findings in
harrier-emr-demo-lab. - Run the full validation suite.
- Mark the KB status as
promotedonce the classifier and demo coverage are merged.
Check The Archive
scripts/archive_known_issue.py check
The check verifies that index entries are well-formed, IDs are unique, and indexed markdown files exist.
Future MCP Retrieval
The index is designed so a future MCP tool can retrieve records by:
- finding category
- log signature
- metric signal
- deploy mode
- source scenario
- recommendation type
- KB id
Until then, the archive is still useful as a reviewable, searchable runbook history.