Failure diagnosis for agent-written tests

A test failed in CI.
Your agent already has the evidence.

Glubean Cloud keeps failed runs, root causes, actual-vs-expected traces, trends, custom metrics, and run history so agents and teams can diagnose from facts instead of reading raw CI logs.

Free tier: 400 runs / monthNo credit cardAgent-readable failure summariesSecrets redacted before upload

Failure evidence snapshot

same test assets, now diagnosable by the team and agents

Proof

Uploaded runs

focused evidence, not log dumps
WorkflowSurfaceStatus
checkout-flowCIPassed
renewal-flowCloudWatching
refund-flowCloudFailed
signup-flowVS CodePassed

Team context

Failures3

grouped by reason.kind

Evidenceactual/expected

typed assertion output

Trace401

last request before failure

Trend97.8%

pass rate this week

VS Code

Author and debug

CI

Gate the same file

Cloud

Diagnose the failure

When a test fails, the evidence is already structured.

No more "paste the whole log." No more guessing whether the agent should change auth, assertions, context, or implementation.

EmailWebhookFocused failures

Why Cloud exists

Local confidence is not failure diagnosis.

Teams do not just need to know that a run failed. They need to know whether the cause is auth setup, schema drift, implementation behavior, flaky infrastructure, or a test that should not be weakened.

Without Cloud

Local runs disappear after you close the editor.

CI logs tell you a test failed, but not what the agent should fix.

Teams end up rebuilding failure history, alerts, and dashboards around the same test assets.

With Cloud

The same test accumulates failure evidence instead of disappearing into logs.

Agents can query run status, events, and focused failure summaries.

Pass rate, latency, and flakiness show whether reliability is improving or drifting.

One artifact, three questions

The artifact should not fork when the question changes.

The local question is "does this pass here?" The CI question is "can this merge?" The Cloud question is "what does this failure mean over time, and what should the agent do next?"

VS Code

Author and debug locally

Write or refine the test in the editor and inspect traces while the context is still fresh.

tests/checkout-flow.test.ts

CI

Gate the same file

Run the identical test in pull requests and scheduled jobs with the CLI. No rewrite, no export step.

tests/checkout-flow.test.ts

Cloud

Diagnose with history

Keep failures, pass rate, latency, and notifications attached to the test the team already owns.

tests/checkout-flow.test.ts

Team Moment

A test fails in CI. The agent already has evidence.

No one starts by reading hundreds of lines of logs. The run is uploaded, failures are normalized, and the agent can pull the smallest useful summary before proposing a fix.

No more dumping CI logs into chat. No more fixing tests by weakening the assertion.

Incident fanout

checkout-flow failed after deploy `2f4e1ad`

Failed
Failure summaryready3 assertion failures grouped
Agent APIavailable/open/v1/runs/:id/failures
Dashboardupdatedpass rate down to 84%
Next actionclearcheck token role before assertion edits

The diagnosis starts from evidence.

What teams actually get

Diagnostic outcomes, not just another dashboard.

Each capability should answer a diagnostic question: what broke, is it getting worse, what evidence should the agent read, and who needs to know?

Focused failure summaries

Agents and humans get failed tests, reason kind, actual vs expected, and recent context without parsing CI logs.

Run history and pass rate

Track every test over time instead of treating each CI run like an isolated event.

Latency trends and regressions

See whether a test is still healthy, getting slower, or starting to drift before users complain.

Custom metrics with ctx.metric()

Track response headers, model scores, ranking quality, or any domain value. Cloud aggregates trends, detects regressions, and alerts on thresholds.

Alerts with context

Email and webhook notifications stay attached to the exact test, run, and trace that triggered them.

Queryable run evidence

Use the Open API to let agents fetch run status, events, and failures instead of copying output into chat.

Active incident

refund-flow

Failure started after deploy `2f4e1ad`. P95 climbed from 184ms to 611ms.

Failed

Notification fanout

Emailsenton-call@team.dev
WebhooksentPager event created
Public statusupdatedrefund-flow degraded

Public status

pass rate97.8%Badge and dashboard stay in sync.

Prometheus export

/metrics/glubean/p95?test=checkout-flow

Open API

/open/v1/runs/:id/failures

Security

Your secrets never leave your machine.

Glubean separates credentials from configuration at the file level and redacts sensitive values before anything is uploaded. Cloud receives test results — assertions, traces, metrics — but never your API keys, tokens, or passwords.

Learn how redaction works

File-level separation

.env

BASE_URL, REGION, flags

.env.secrets

API_KEY, TOKEN — gitignored

ctx.vars vs ctx.secrets — the runner knows which values are sensitive.

Custom redaction engine

Authorization: Bearer [REDACTED]

api_key: [REDACTED:key]

token: sk_li***_4xN

Dual-layer detection: sensitive keys + value patterns (JWT, AWS, GitHub PAT).

22 built-in rules that cannot be weakened — only extended.

Preview with glubean redact before uploading.

Start diagnosing failures

Keep the test. Add failure diagnosis.

Cloud is strongest when the test already runs and the team needs failures to become diagnosable, queryable, and auditable.