Failure diagnosis for agent-written tests
Glubean Cloud keeps failed runs, root causes, actual-vs-expected traces, trends, custom metrics, and run history so agents and teams can diagnose from facts instead of reading raw CI logs.
Failure evidence snapshot
same test assets, now diagnosable by the team and agents
Uploaded runs
focused evidence, not log dumpsTeam context
grouped by reason.kind
typed assertion output
last request before failure
pass rate this week
VS Code
Author and debug
CI
Gate the same file
Cloud
Diagnose the failure
When a test fails, the evidence is already structured.
No more "paste the whole log." No more guessing whether the agent should change auth, assertions, context, or implementation.
Why Cloud exists
Teams do not just need to know that a run failed. They need to know whether the cause is auth setup, schema drift, implementation behavior, flaky infrastructure, or a test that should not be weakened.
Without Cloud
Local runs disappear after you close the editor.
CI logs tell you a test failed, but not what the agent should fix.
Teams end up rebuilding failure history, alerts, and dashboards around the same test assets.
With Cloud
The same test accumulates failure evidence instead of disappearing into logs.
Agents can query run status, events, and focused failure summaries.
Pass rate, latency, and flakiness show whether reliability is improving or drifting.
One artifact, three questions
The local question is "does this pass here?" The CI question is "can this merge?" The Cloud question is "what does this failure mean over time, and what should the agent do next?"
VS Code
Write or refine the test in the editor and inspect traces while the context is still fresh.
CI
Run the identical test in pull requests and scheduled jobs with the CLI. No rewrite, no export step.
Cloud
Keep failures, pass rate, latency, and notifications attached to the test the team already owns.
Team Moment
No one starts by reading hundreds of lines of logs. The run is uploaded, failures are normalized, and the agent can pull the smallest useful summary before proposing a fix.
No more dumping CI logs into chat. No more fixing tests by weakening the assertion.
Incident fanout
checkout-flow failed after deploy `2f4e1ad`
The diagnosis starts from evidence.
What teams actually get
Each capability should answer a diagnostic question: what broke, is it getting worse, what evidence should the agent read, and who needs to know?
Agents and humans get failed tests, reason kind, actual vs expected, and recent context without parsing CI logs.
Track every test over time instead of treating each CI run like an isolated event.
See whether a test is still healthy, getting slower, or starting to drift before users complain.
Track response headers, model scores, ranking quality, or any domain value. Cloud aggregates trends, detects regressions, and alerts on thresholds.
Email and webhook notifications stay attached to the exact test, run, and trace that triggered them.
Use the Open API to let agents fetch run status, events, and failures instead of copying output into chat.
Active incident
refund-flow
Failure started after deploy `2f4e1ad`. P95 climbed from 184ms to 611ms.
Notification fanout
Public status
Prometheus export
/metrics/glubean/p95?test=checkout-flow
Open API
/open/v1/runs/:id/failures
Security
Glubean separates credentials from configuration at the file level and redacts sensitive values before anything is uploaded. Cloud receives test results — assertions, traces, metrics — but never your API keys, tokens, or passwords.
Learn how redaction worksFile-level separation
.env
BASE_URL, REGION, flags
.env.secrets
API_KEY, TOKEN — gitignored
ctx.vars vs ctx.secrets — the runner knows which values are sensitive.
Custom redaction engine
Authorization: Bearer [REDACTED]
api_key: [REDACTED:key]
token: sk_li***_4xN
Dual-layer detection: sensitive keys + value patterns (JWT, AWS, GitHub PAT).
22 built-in rules that cannot be weakened — only extended.
Preview with glubean redact before uploading.