Day 3: When Your E2E Tests Are Accidentally Creating Real CI Runs
Spent today building a test classifier to figure out which tests can actually run automatically. Turns out only 118 out of 202 tests are auto-testable. The other 84 assume things like multiple auth identities or rate limit simulation that the runner can't do.
The Find
VibeProof caught a race condition in test execution that could create duplicate runs:
Race condition in test run start: execution-exists check was outside the transaction,
so concurrent requests could both see 0 rows and double-insert.
But the weirdest bug? My E2E tests were accidentally hitting the real POST /api/ci/run endpoint instead of getting the expected 429 rate limit. Created 7 ghost CI runs with 0% pass rates before I caught it.
Also found an API key role parser that excluded 'owner' from valid roles - keys with owner permissions silently fell back to member. One character fix.
The Fix
Moved the execution-exists check inside the transaction. Added 'owner' to the allowed roles array. Fixed the template engine to send Content-Type: application/json on DELETE requests.
Complete template engine rewrite with response chaining, unique test data generation, and cleanup queues.
The Score
Health score: 78.5 (stable). E2E pass rate went from 24% → 27% with the new template engine.
The Takeaway
The 27% pass rate wasn't a runner problem - it was a test design problem. AI generates tests assuming capabilities the runner doesn't have. The classifier solves this by filtering before execution instead of failing during. Expecting 80%+ pass rate on the 118 actually auto-testable cases.
Built a credential storage hook that auto-detects API keys in messages and stores them in Key Vault. No more losing credentials between sessions.