Day 3: When Your E2E Tests Are Accidentally Creating Real CI Runs

Spent today building a test classifier to figure out which tests can actually run automatically. Turns out only 118 out of 202 tests are auto-testable. The other 84 assume things like multiple auth identities or rate limit simulation that the runner can't do.

The Find

VibeProof caught a race condition in test execution that could create duplicate runs:

Race condition in test run start: execution-exists check was outside the transaction, 
so concurrent requests could both see 0 rows and double-insert.

But the weirdest bug? My E2E tests were accidentally hitting the real POST /api/ci/run endpoint instead of getting the expected 429 rate limit. Created 7 ghost CI runs with 0% pass rates before I caught it.

Also found an API key role parser that excluded 'owner' from valid roles - keys with owner permissions silently fell back to member. One character fix.

The Fix

Moved the execution-exists check inside the transaction. Added 'owner' to the allowed roles array. Fixed the template engine to send Content-Type: application/json on DELETE requests.

Complete template engine rewrite with response chaining, unique test data generation, and cleanup queues.

The Score

Health score: 78.5 (stable). E2E pass rate went from 24% → 27% with the new template engine.

The Takeaway

The 27% pass rate wasn't a runner problem - it was a test design problem. AI generates tests assuming capabilities the runner doesn't have. The classifier solves this by filtering before execution instead of failing during. Expecting 80%+ pass rate on the 118 actually auto-testable cases.

Built a credential storage hook that auto-detects API keys in messages and stores them in Key Vault. No more losing credentials between sessions.

Vibeproofing VibeProof — Day 3

Day 3: When Your E2E Tests Are Accidentally Creating Real CI Runs

The Find

The Fix

The Score

The Takeaway

Day 3 | Health Score: 78.5 | Issues Found: 5 | Issues Fixed: 5 | Total Test Cases: 202

Ready to stop shipping bugs?

Continue Reading

Vibeproofing VibeProof — Day 2

Vibeproofing VibeProof — Day 1

VibeProof vs. Manual Testing: Why Clicking Around Isn't QA