people-analytics
HR feedback & exit interviews
Build retention dashboards that turn exit interviews and engagement surveys into themes HR can act on — with quotes to back every finding.
Private beta — waitlist open
One POST turns transcripts, survey open-ends, and reviews into codes, themes, quotes, sentiment, and confidence scores — structured JSON, grounded in real qualitative methodology, ready to render.
curl https://api.thematicanalysis.ai/v1/analyses \
-H "Authorization: Bearer ta_live_..." \
-H "Content-Type: application/json" \
-d '{
"documents": [
{ "id": "exit-014", "text": "I loved the team, but after my manager left there was no career conversation for nine months..." },
{ "id": "exit-015", "text": "Pay was fine. What wore me down was changing priorities every sprint..." }
],
"options": { "sentiment": true, "summary": true }
}'{
"themes": [
{
"name": "Career stagnation after manager change",
"description": "Leavers report development conversations stopping when managers rotate.",
"prevalence": 162,
"sentiment": "negative",
"confidence": 0.91,
"codes": ["no growth path", "manager turnover"],
"quotes": [
{
"document_id": "exit-014",
"span": "no career conversation for nine months"
}
]
}
],
"status": "complete"
}Open-ended text is where the real signal lives — and where pipelines go to die. A homemade prompt works on fifty reviews, then drifts on five thousand: theme names change between runs, counts don't reconcile, and nobody can point to the quote behind a finding. When a customer asks “why is this a theme?”, “the model said so” is not an answer you can ship.
How it works
POST your documents — transcripts, open-ends, reviews, tickets. Up to 50,000 per batch, with optional metadata that flows through to the output.
POST /v1/analyses
{
"documents": [
{ "id": "rev-2201", "text": "Checkout kept timing out on mobile..." },
{ "id": "rev-2202", "text": "Love the product, hate the delivery updates..." }
// ...up to 50,000 documents per batch
],
"codebook_id": "cb_q2_reviews", // optional: reuse your codebook
"webhook_url": "https://yourapp.com/hooks/themes"
}The engine works through the six phases of reflexive thematic analysis: familiarization, coding, clustering candidate themes, then reviewing and naming them. Poll for status or just give us a webhook.
{
"id": "an_8c2f91",
"status": "processing",
"phase": "clustering_codes",
"progress": { "documents_coded": 18440, "documents_total": 50000 }
}Named themes with descriptions, prevalence counts, supporting quotes, sentiment, and confidence — plus the codebook, persisted so your next batch is coded consistently.
{
"id": "an_8c2f91",
"status": "complete",
"themes": [ ... ], // named, described, counted
"codes": [ ... ], // with supporting text spans
"summary": { ... }, // ready to render
"codebook_id": "cb_q2_reviews"
}What you get back
Every claim in the output is traceable to the text that supports it. This is the difference between a theme and a guess.
{
"themes": [
{
"name": "Delivery updates erode trust",
"description": "Customers describe tracking emails that
contradict the courier, making support feel evasive.",
"prevalence": 412,
"share": 0.18,
"sentiment": { "label": "negative", "score": -0.72 },
"confidence": 0.88,
"codes": [
{
"label": "tracking mismatch",
"spans": [
{
"document_id": "rev-2202",
"text": "the app said delivered while the courier said delayed",
"char_start": 41,
"char_end": 95
}
]
}
],
"representative_quotes": [
{
"document_id": "rev-2202",
"text": "hate the delivery updates — the app said delivered while the courier said delayed"
}
]
}
],
"document_sentiment": { "negative": 0.46, "neutral": 0.31, "positive": 0.23 },
"summary": {
"headline": "Delivery communication is the dominant negative driver this quarter",
"sections": [ ... ]
}
}themes[].name + description
Human-readable, dashboard-ready. Stable naming across runs when you reuse a codebook.
prevalence + share
How many documents express the theme, and what fraction of the corpus — counts that reconcile.
codes[].spans
Every code points to exact character spans in your source text. Findings stay auditable.
representative_quotes
Real excerpts chosen to illustrate the theme — drop them straight into a report or UI.
sentiment
Theme-level and document-level, so you know not just what people talk about but how they feel.
confidence
A calibrated score on every theme and code. Filter, flag, or route low-confidence output for review.
Use cases
If your users generate open-ended text, you can ship customer feedback theme analysis as a feature — without building the engine.
people-analytics
Build retention dashboards that turn exit interviews and engagement surveys into themes HR can act on — with quotes to back every finding.
voice-of-customer
Ship a review-intelligence feature that tells e-commerce teams why ratings move, not just that they moved.
civic-tech
Analyze thousands of citizen responses consistently and publish defensible summaries with a clear evidence trail.
research-ops
Pipe interview transcripts and usability sessions into your research repo and get coded, comparable findings across studies.
ed-tech
Give instructors themed end-of-course feedback at faculty scale, instead of a spreadsheet of ten thousand comments.
insights
Turn focus groups and depth interviews into client-ready theme reports in hours, with the methodology section already written.
The honest question
Yes — and for a demo, you should. The gap appears when the demo becomes a product. A production theme feature needs evaluation, schema stability, consistency across runs, and comparison across time. That is what you're buying: not access to a model, but the engineering around it that you'd otherwise spend a quarter building.
| Capability | DIY prompting | thematicanalysis.ai |
|---|---|---|
| Consistent output schema | Every prompt tweak risks breaking your parser | Versioned JSON schema, stable across runs and releases |
| Codebook persistence | Each run reinvents the codes; Q3 isn't comparable to Q2 | Codebooks persist and apply across batches and time |
| Evaluation | You build (and maintain) the eval harness yourself | Continuously evaluated against human-coded datasets |
| Confidence & evidence | Fluent output, no way to know what to trust | Calibrated confidence scores and exact supporting spans |
| Scale | Context limits, chunking strategy, retries, rate limits — yours to solve | Async batches to 50k documents, webhooks, automatic chunking |
| Cross-dataset comparison | Manual theme matching between runs | Theme alignment across datasets and time periods, built in |
Pricing at launch
The planned shape — final numbers publish at launch, and waitlist members lock in launch pricing.
Free
Full schema, sample datasets, generous dev limits. Build and demo before you pay.
Usage-based
For shipping your first feature: production keys, webhooks, standard batches.
Usage-based
Larger batches, codebook persistence, cross-dataset comparison, priority throughput.
Volume pricing
High-volume pipelines, configurable retention, SLAs, dedicated support.
Custom
White-label output, custom evaluation on your domain, security review, DPA.
Join the waitlist for early sandbox access and launch pricing. Telling us what you'd build helps us prioritize your invite.
FAQ
Every theme ships with a confidence score, supporting codes, and the exact text spans behind them, so you can verify output rather than trust it blindly. The engine is evaluated continuously against human-coded datasets across domains like HR feedback, product reviews, and consultation responses, and we publish evaluation summaries to waitlist members before launch. Low-confidence themes are flagged instead of silently included.
Plain text per document is the core input — interview transcripts, survey open-ends, reviews, tickets, anything textual. Send up to 50,000 documents per batch as JSON, or upload CSV and JSONL files to the batch endpoint. Each document takes an optional ID and metadata object that flows through to the response, so joining themes back to your own records is trivial.
Your data is yours. Submitted text is processed to produce your results, retained only for the window you configure (default 30 days, configurable to zero retention), and never used to train models. You can delete any analysis or document immediately via the API, and account deletion purges everything. Data is encrypted in transit and at rest.
English is fully supported at launch, with Spanish, French, German, Portuguese, and Dutch in beta. You can mix languages in one batch; each document is coded in its own language and themes can be returned in the language of your choice. Tell us your language needs in the waitlist form — it directly shapes the rollout order.
Usage-based: you pay per document analyzed, with volume discounts as you scale. The free sandbox covers development and small pilots, paid tiers add larger batches, webhooks, codebook persistence, and white-label options. Waitlist members lock in launch pricing. Final numbers are published at launch — the tiers on this page show the intended shape.
Yes. You can run fully inductive analysis (the engine derives codes from your data), supply your own codebook for deductive coding, or combine both. Codebooks persist across analyses, so a Q3 run is coded consistently with Q2 and themes are comparable across time — the thing that is hardest to get from ad-hoc prompting.
Yes — a free sandbox with generous limits, sample datasets, and the full response schema, so you can integrate and demo before you ever pay. Waitlist members get sandbox access first, in invite order.
Early access opens to the waitlist in waves, followed by general availability. Joining the waitlist gets you sandbox access in the first waves, launch pricing, and a direct line to the team while the API is still being shaped.