Confidence Framework

How we vet new jobs before they enter the catalog.

Brutal curation is the secret weapon. Every job in the catalog has passed the same checklist. This page documents it, publicly, so our quality bar is auditable as we scale from 76 jobs toward 200 and 500.

Why we publish this

The hardest scaling risk for a catalog like this is uneven quality across price tiers. A $4.99 receipt OCR has binary success criteria. A $99.99 grant proposal does not. As the catalog grows, the risk of quietly letting a job in at the wrong tier, or letting a weak job in at a strong tier, compounds.

Publishing the vetting process is a commitment. If we skip a step, it's visible. If a job in the catalog fails the bar described here, a buyer can point at the gap. The public-facing process is how we hold ourselves to the private one.

The vetting checklist

Every proposed job runs this eight-step gauntlet before it becomes a buyable thing. No shortcuts, no exceptions for "strategic" categories.

1

Shape-of-output is definable in one sentence

Before a job enters the catalog, we must be able to describe the output to a buyer in a single sentence they understand without further context. If we can't, the job is too ambiguous to be a vending-machine item.

2

Stress-test corpus of 20 inputs, varied in realistic ways

We assemble 20 representative inputs for the job: long and short, clean and messy, domain-specific and generalist, edge-cases and happy-path. We run every one through the intended processing before the job goes public.

3

Every stress-test output meets the stated structural outline

If the job page promises an executive summary + 4 sections + appendix, every one of the 20 sample outputs must actually produce those. No silent omissions. No sections collapsed into prose. Structure compliance is non-negotiable.

4

Adversarial input handling

We include 3-5 adversarial inputs in the corpus: prompt injection attempts, off-topic content, moderation edge cases. The job must refuse or sanitize gracefully. It cannot silently produce garbage or echo the adversarial instruction.

5

Complexity tier is assigned honestly

Tier 1 if the stress-test outputs have zero variance in correctness. Tier 2 if outputs are reliably well-structured but flavor varies. Tier 3 if the first drafts would benefit from human refinement. We do not promote jobs up a tier for marketing reasons.

6

Refund path is defined before the job goes live

Every job has a documented refund criteria: if the output misses on any of N specific structural promises, the buyer gets refunded without arbitration. We write this criteria before we publish the price.

7

Price reflects tier, not aspiration

Price is anchored to complexity tier and delivery friction. Tier 1 jobs price below tier 2. Tier 3 jobs price higher, but honestly, with validation criteria spelled out so the buyer understands they are buying a first draft not a finished artifact.

8

Sanitized sample output published before launch

Every job at $29 or higher ships with a public sample output at /jobs/:jobId/sample. The sample is representative of the stress-test median, not the best stress-test output. No cherry-picking.

How complexity tier is decided

Tier assignment is the most common place a catalog like this quietly cheats. We've formalized it:

Deterministic

The stress-test corpus shows zero variance in correctness. The job either succeeds or fails in a way the buyer can immediately detect. Receipt OCR, file format conversion, extraction, binary classification.

Structured-generative

Stress-test outputs are reliably well-structured, but flavor varies. Buyers may want to adjust voice or tone before shipping, but the skeleton is always there. Blog posts, summaries, documentation, standardized reports.

Generative-nuanced

Stress-test outputs produce first-draft structural depth but require buyer refinement to ship. The machine provides scaffolding and ~80% of the prose. Expect to tighten claims and verify specifics. Grant proposals, pitch decks, SWOT, strategic frameworks.

A job does not get promoted to a lower-friction tier because the price needs to look attractive. Tier reflects what the stress-test corpus actually showed. Period.

Jobs we have rejected

A catalog is defined as much by what it refuses as by what it offers. A few of the proposals that did not make it in:

rejected

Real-time market analysis report

Requires current-moment market data. The machine has no live data access, and constructing one would compromise the no-stored-state architecture. Rejected.

rejected

Personalized resume rewrite with career coaching

The valuable part is the career-strategy judgment, not the prose cleanup. Also requires ongoing context we intentionally don't keep. Rejected.

rejected

Medical symptom analysis

Stakes of being wrong are too high. Accountability requires a licensed professional. Rejected on principle, not capability.

rejected

Automated YouTube thumbnail design

Stress-test corpus showed acceptable structural output but poor aesthetic consistency across brands. The variance was too high to set buyer expectations honestly at any price point. Deferred, not permanently rejected.

rejected

Full-length novel draft

Tier 3 at extreme length. Output would be technically well-structured but the refinement burden on the buyer is so high that the price point would need to be low enough to insult the effort, or high enough to misrepresent what's being delivered. Rejected as wrong fit for the vending machine format.

See also what is not in the catalog for the categorical exclusions (therapy, original brand strategy, anything requiring sustained human judgment or ongoing context).

Our scaling commitment

We would rather have 76 jobs that every buyer trusts than 500 jobs with variable quality. As we grow the catalog, this checklist does not get more lenient; it gets more strictly enforced. Every job added between 76 and 500 passes the same gauntlet the first 76 did.

If a job in the catalog fails a buyer in a way that violates any step above, we want to hear about it. Refund paths are defined. The feedback changes the corpus for that job and every job that shares its structure.

RADIO