Private beta — waitlist open

The thematic analysis API: raw text in, defensible themes out.

One POST turns transcripts, survey open-ends, and reviews into codes, themes, quotes, sentiment, and confidence scores — structured JSON, grounded in real qualitative methodology, ready to render.

request.sh
curl https://api.thematicanalysis.ai/v1/analyses \
  -H "Authorization: Bearer ta_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      { "id": "exit-014", "text": "I loved the team, but after my manager left there was no career conversation for nine months..." },
      { "id": "exit-015", "text": "Pay was fine. What wore me down was changing priorities every sprint..." }
    ],
    "options": { "sentiment": true, "summary": true }
  }'
response.json
{
  "themes": [
    {
      "name": "Career stagnation after manager change",
      "description": "Leavers report development conversations stopping when managers rotate.",
      "prevalence": 162,
      "sentiment": "negative",
      "confidence": 0.91,
      "codes": ["no growth path", "manager turnover"],
      "quotes": [
        {
          "document_id": "exit-014",
          "span": "no career conversation for nine months"
        }
      ]
    }
  ],
  "status": "complete"
}

You have ten thousand answers. Your product needs five themes.

Open-ended text is where the real signal lives — and where pipelines go to die. A homemade prompt works on fifty reviews, then drifts on five thousand: theme names change between runs, counts don't reconcile, and nobody can point to the quote behind a finding. When a customer asks “why is this a theme?”, “the model said so” is not an answer you can ship.

How it works

Theme extraction in three steps, not three months

01

Send your text

POST your documents — transcripts, open-ends, reviews, tickets. Up to 50,000 per batch, with optional metadata that flows through to the output.

step-1-request.jsonc
POST /v1/analyses
{
  "documents": [
    { "id": "rev-2201", "text": "Checkout kept timing out on mobile..." },
    { "id": "rev-2202", "text": "Love the product, hate the delivery updates..." }
    // ...up to 50,000 documents per batch
  ],
  "codebook_id": "cb_q2_reviews",   // optional: reuse your codebook
  "webhook_url": "https://yourapp.com/hooks/themes"
}
02

The engine codes and clusters

The engine works through the six phases of reflexive thematic analysis: familiarization, coding, clustering candidate themes, then reviewing and naming them. Poll for status or just give us a webhook.

step-2-status.json
{
  "id": "an_8c2f91",
  "status": "processing",
  "phase": "clustering_codes",
  "progress": { "documents_coded": 18440, "documents_total": 50000 }
}
03

Get structured themes back

Named themes with descriptions, prevalence counts, supporting quotes, sentiment, and confidence — plus the codebook, persisted so your next batch is coded consistently.

step-3-result.jsonc
{
  "id": "an_8c2f91",
  "status": "complete",
  "themes": [ ... ],          // named, described, counted
  "codes": [ ... ],           // with supporting text spans
  "summary": { ... },         // ready to render
  "codebook_id": "cb_q2_reviews"
}

What you get back

A response schema that defends itself

Every claim in the output is traceable to the text that supports it. This is the difference between a theme and a guess.

GET /v1/analyses/an_8c2f91
{
  "themes": [
    {
      "name": "Delivery updates erode trust",
      "description": "Customers describe tracking emails that
        contradict the courier, making support feel evasive.",
      "prevalence": 412,
      "share": 0.18,
      "sentiment": { "label": "negative", "score": -0.72 },
      "confidence": 0.88,
      "codes": [
        {
          "label": "tracking mismatch",
          "spans": [
            {
              "document_id": "rev-2202",
              "text": "the app said delivered while the courier said delayed",
              "char_start": 41,
              "char_end": 95
            }
          ]
        }
      ],
      "representative_quotes": [
        {
          "document_id": "rev-2202",
          "text": "hate the delivery updates — the app said delivered while the courier said delayed"
        }
      ]
    }
  ],
  "document_sentiment": { "negative": 0.46, "neutral": 0.31, "positive": 0.23 },
  "summary": {
    "headline": "Delivery communication is the dominant negative driver this quarter",
    "sections": [ ... ]
  }
}
  • themes[].name + description

    Human-readable, dashboard-ready. Stable naming across runs when you reuse a codebook.

  • prevalence + share

    How many documents express the theme, and what fraction of the corpus — counts that reconcile.

  • codes[].spans

    Every code points to exact character spans in your source text. Findings stay auditable.

  • representative_quotes

    Real excerpts chosen to illustrate the theme — drop them straight into a report or UI.

  • sentiment

    Theme-level and document-level, so you know not just what people talk about but how they feel.

  • confidence

    A calibrated score on every theme and code. Filter, flag, or route low-confidence output for review.

Use cases

One endpoint, many products

If your users generate open-ended text, you can ship customer feedback theme analysis as a feature — without building the engine.

people-analytics

HR feedback & exit interviews

Build retention dashboards that turn exit interviews and engagement surveys into themes HR can act on — with quotes to back every finding.

voice-of-customer

Customer review analysis

Ship a review-intelligence feature that tells e-commerce teams why ratings move, not just that they moved.

civic-tech

Government & public consultation

Analyze thousands of citizen responses consistently and publish defensible summaries with a clear evidence trail.

research-ops

UX & product research

Pipe interview transcripts and usability sessions into your research repo and get coded, comparable findings across studies.

ed-tech

Education & course feedback

Give instructors themed end-of-course feedback at faculty scale, instead of a spreadsheet of ten thousand comments.

insights

Market research transcripts

Turn focus groups and depth interviews into client-ready theme reports in hours, with the methodology section already written.

The honest question

“Couldn't I just prompt a model myself?”

Yes — and for a demo, you should. The gap appears when the demo becomes a product. A production theme feature needs evaluation, schema stability, consistency across runs, and comparison across time. That is what you're buying: not access to a model, but the engineering around it that you'd otherwise spend a quarter building.

Comparison between prompting a foundation model yourself and using the thematic analysis API
CapabilityDIY promptingthematicanalysis.ai
Consistent output schemaEvery prompt tweak risks breaking your parserVersioned JSON schema, stable across runs and releases
Codebook persistenceEach run reinvents the codes; Q3 isn't comparable to Q2Codebooks persist and apply across batches and time
EvaluationYou build (and maintain) the eval harness yourselfContinuously evaluated against human-coded datasets
Confidence & evidenceFluent output, no way to know what to trustCalibrated confidence scores and exact supporting spans
ScaleContext limits, chunking strategy, retries, rate limits — yours to solveAsync batches to 50k documents, webhooks, automatic chunking
Cross-dataset comparisonManual theme matching between runsTheme alignment across datasets and time periods, built in

Pricing at launch

Pay per document, not per quarter of engineering

The planned shape — final numbers publish at launch, and waitlist members lock in launch pricing.

Sandbox

Free

Full schema, sample datasets, generous dev limits. Build and demo before you pay.

Builder

Usage-based

For shipping your first feature: production keys, webhooks, standard batches.

Growth

Usage-based

Larger batches, codebook persistence, cross-dataset comparison, priority throughput.

Scale

Volume pricing

High-volume pipelines, configurable retention, SLAs, dedicated support.

Enterprise

Custom

White-label output, custom evaluation on your domain, security review, DPA.

Ship your themes feature in days, not quarters

Join the waitlist for early sandbox access and launch pricing. Telling us what you'd build helps us prioritize your invite.

FAQ

Questions developers actually ask

Every theme ships with a confidence score, supporting codes, and the exact text spans behind them, so you can verify output rather than trust it blindly. The engine is evaluated continuously against human-coded datasets across domains like HR feedback, product reviews, and consultation responses, and we publish evaluation summaries to waitlist members before launch. Low-confidence themes are flagged instead of silently included.

Plain text per document is the core input — interview transcripts, survey open-ends, reviews, tickets, anything textual. Send up to 50,000 documents per batch as JSON, or upload CSV and JSONL files to the batch endpoint. Each document takes an optional ID and metadata object that flows through to the response, so joining themes back to your own records is trivial.

Your data is yours. Submitted text is processed to produce your results, retained only for the window you configure (default 30 days, configurable to zero retention), and never used to train models. You can delete any analysis or document immediately via the API, and account deletion purges everything. Data is encrypted in transit and at rest.

English is fully supported at launch, with Spanish, French, German, Portuguese, and Dutch in beta. You can mix languages in one batch; each document is coded in its own language and themes can be returned in the language of your choice. Tell us your language needs in the waitlist form — it directly shapes the rollout order.

Usage-based: you pay per document analyzed, with volume discounts as you scale. The free sandbox covers development and small pilots, paid tiers add larger batches, webhooks, codebook persistence, and white-label options. Waitlist members lock in launch pricing. Final numbers are published at launch — the tiers on this page show the intended shape.

Yes. You can run fully inductive analysis (the engine derives codes from your data), supply your own codebook for deductive coding, or combine both. Codebooks persist across analyses, so a Q3 run is coded consistently with Q2 and themes are comparable across time — the thing that is hardest to get from ad-hoc prompting.

Yes — a free sandbox with generous limits, sample datasets, and the full response schema, so you can integrate and demo before you ever pay. Waitlist members get sandbox access first, in invite order.

Early access opens to the waitlist in waves, followed by general availability. Joining the waitlist gets you sandbox access in the first waves, launch pricing, and a direct line to the team while the API is still being shaped.