Mantis is an automated diagnostic tool that triages UDMI test failures. It evaluates stability across multiple test runs and generates Root Cause Analysis (RCA) reports by correlating logs across the Sequencer, UDMIS, and Pubber.
Mantis requires access to Google’s Gemini models for diagnostics.
Option A: Developer API Key (Public Endpoint)
export GEMINI_API_KEY="your_gemini_api_key"
Option B: Enterprise GCP Vertex AI (ADC Endpoint)
Source the enable_vertex script to auto-detect your active gcloud project or set an explicit project ID:
source tools/mantis/bin/enable_vertex [gcp_project_id]
Run bin/triage from the UDMI root directory. Provide test bundle archives (.zip, .tgz) or extracted directories as input via -i.
tools/mantis/bin/triage -i path/to/bundle1.zip path/to/bundle2.zip
Alternatively, point to extracted directories:
tools/mantis/bin/triage -i out/mantis/run_1/ out/mantis/run_2/
bin/triage)| Argument | Description | Example |
|---|---|---|
-i, --test-runs |
(Required) Test bundle paths or extracted directories (space-separated). | -i path/run1.zip path/run2.zip |
-p, --project-path |
Path to the UDMI project root. Use when testing against a different branch or PR directory. | -p ~/Projects/udmi/pr-branch |
-d, --device |
Filter triage by specific device ID. | -d AHU-1 |
-t, --test |
Filter triage by specific test case name. | -t system_min_loglevel |
--all |
Force triage of all identified failed tests. | --all |
--swe |
Use the SWE software debugging playbook instead of the default OEM integrator playbook. | --swe |
-h, --help |
Show usage instructions. | -h |
Use collect_test_runs to actively execute new test loops locally or fetch historical CI runs:
# Run the sequencer locally 3 times in a row
tools/mantis/bin/collect_test_runs --mode local --target //mqtt/localhost --runs 3
# Fetch the last 5 completed CI runs from GitHub
tools/mantis/bin/collect_test_runs --mode ci_search --runs 5
bin/collect_test_runs)| Argument | Description | Example |
|---|---|---|
-t, --target |
Target project specification. Use ‘skip’ to omit from CI. | --target //mqtt/localhost |
-n, --runs |
Number of loops or historical runs to retrieve (default: 3). | --runs 5 |
-m, --mode |
Execution mode: local (sandbox), ci (dispatch new), ci_search (retrieve past). |
--mode ci_search |
-b, --branch |
GitHub branch to target or search (default: active branch). | --branch master |
-r, --remote |
Git remote name to use for GitHub API interactions (default: origin). | --remote upstream |
--suite |
Test suite to run locally (sequencer, itemized, both). |
--suite sequencer |
--tests |
Comma-separated list of selective tests to run locally. | --tests valid_serial_no |
--verbose |
Monitor logs foreground. | --verbose |
-h, --help |
Show usage instructions. | -h |
If you need specialized AI behavior (e.g., custom prompts or different concurrency limits), use the interactive playbook generator. Note that this is a fully interactive terminal wizard and does not accept standard command-line arguments:
tools/mantis/bin/create_playbook
This utility will ask for your preferences and generate a new custom YAML playbook in your current directory. You can then pass it to Mantis using the --playbook flag:
tools/mantis/bin/triage -i out/mantis/run_data --playbook ./my_custom_playbook.yaml
Mantis generates reports in the output directory (e.g., out/mantis/<run_name>/):
triage_manifest.jsonAn internal JSON map separating transient flakes from regressions, used by the diagnostics engine.
triage_summary_report.mdA high-level summary of pass/fail metrics across all runs. It groups failures by test case and provides a 1-sentence AI Root Cause Analysis. Start here to prioritize fixes.
diagnose/.../triage_analysis.mdDetailed diagnostic reports for each failure ID. Each report contains:
By default, Mantis uses the OEM & Systems Integrator Compliance Playbook. When testing physical hardware or black-box devices, Mantis will:
When developing the local emulator (Pubber) or internal framework software, use the SWE (Software Engineer) Playbook instead. It changes the behavior to:
(To switch to this behavior, pass the --swe flag to the triage script.)