Quarterly Transparency Report

Panel Fraud Transparency Report — Q3 2025

What 1,108 audited interview transcripts across the Carevoices panel and benchmarked legacy panels tell us about the AI-respondent fraud crisis in healthcare research — and how Carevoices' 6-layer detection stack performs against it.

Sample: 1,108 audited interview transcripts

TL;DR

Carevoices audited 1,108 interview transcripts in Q3 2025 — 612 from the Carevoices proprietary panel and 496 from benchmarked legacy survey panels — to measure AI-respondent leak rates across healthcare research. The verified leak rate on the Carevoices panel was 0.4% (2 of 612 interviews). The leak rate on benchmarked legacy panels was 18-31% depending on the specific panel and study type. Carevoices' 6-layer fraud detection stack (KYC at intake, license verification, voice biometric continuity, AI-on-AI detection, behavioral fingerprint, payment infrastructure) catches AI-assisted respondents that PNAS 2025 documented evade legacy detection 99.8% of the time.

Executive Summary

Carevoices publishes quarterly transparency reports on AI-respondent fraud rates because the healthcare research industry is in the middle of a panel quality crisis. PNAS 2025 research shows AI bots evade legacy fraud detection 99.8% of the time. 34% of survey respondents admit to using ChatGPT or similar tools to answer open-ended questions. Pharma sponsors filing FDA submissions on AI-respondent-contaminated data face real regulatory exposure. This report quantifies the gap between Carevoices' purpose-built fraud detection and legacy survey panels, audited with verifiable methodology.

0.4% AI-respondent leak rate on the Carevoices panel (2 of 612 audited interviews) — among the lowest verified rates published by any healthcare research panel as of Q3 2025.
Participants: 2/612 • 0%
18-31% AI-respondent leak rate on benchmarked legacy survey panels — varies by panel and study type. Lowest legacy panel: 18% on a tier-1 healthcare survey panel. Highest: 31% on a generalist survey panel running healthcare studies.
Participants: 119/496 • 24%
AI-on-AI detection layer catches 87% of AI-assisted respondents that pass earlier verification layers (identity, license, voice baseline) — the crucial last-line defense.
Participants: 47/54 • 87%
Voice biometric continuity catches 11% of mid-interview voice swaps that legacy text-based fraud detection structurally cannot detect.
Participants: 13/116 • 11%

1. Audit methodology

All claims in this report are independently verifiable. Methodology, sample composition, and detection criteria are published transparently below.

Carevoices audited 1,108 interview transcripts across two cohorts: 612 from the Carevoices proprietary panel (the 'CV-panel' sample) and 496 from benchmarked legacy survey panels (the 'legacy-panel' sample, sourced from 4 distinct legacy survey panels via partner sponsors who consented to anonymized cross-comparison).

The audit period was July 1 to September 30, 2025 (Q3 2025). All audited transcripts were from healthcare research engagements (pharma message validation, nurse experience research, physician decision-driver studies, patient experience research). Transcripts were selected randomly within each cohort, stratified by specialty and study type to ensure representative comparison.

AI-respondent detection criteria included: voice TTS artifact detection in audio recordings, behavioral fingerprint inconsistency across interviews (vocabulary, hesitation patterns, opinion stability), AI-on-AI dynamic challenge response patterns (testing for ChatGPT-mediated responses), voice biometric continuity within interviews (catching mid-interview swaps), and post-hoc transcript pattern analysis (detecting characteristic LLM-generated text patterns).

Detection criteria were applied identically across both cohorts. Audit logs and methodology documentation available on request for academic verification under NDA.

Total audit sample: 1,108 interview transcripts across Q3 2025 (612 Carevoices panel + 496 legacy-benchmark panels).
Participants: 1108/1108 • 100%

2. Carevoices panel results

Carevoices' 6-layer fraud detection stack flagged 2 AI-respondent incidents across 612 audited interviews — a 0.4% verified leak rate.

Of 612 Carevoices panel interviews audited in Q3 2025, 2 incidents were flagged as confirmed AI-respondent activity (0.4% leak rate). Both incidents were detected at the AI-on-AI verification layer mid-interview, leading to interview termination, panelist suspension pending review, and exclusion of the affected transcripts from sponsor deliverables.

47 interviews triggered earlier-layer fraud signals (identity verification challenges, license verification anomalies, voice baseline mismatches) but were resolved as legitimate panelist activity after manual review (typical causes: panelist using a different device, voice quality changes, license verification timing issues). These 47 interviews did NOT contribute to the 0.4% leak rate metric.

Specific incident details (anonymized): Incident #1 — panelist passed identity + license + voice baseline verification, but mid-interview AI-on-AI dynamic challenge ('repeat that with the word mango in it') exposed structured response generation patterns inconsistent with conversational human speech. Incident #2 — voice biometric continuity layer flagged a mid-interview voice swap (different speaker after minute 14); investigation confirmed unauthorized substitution.

0.4% AI-respondent leak rate on Carevoices panel (2 of 612 audited interviews).
Participants: 2/612 • 0%
AI-on-AI detection caught 87% of AI-assisted respondents that passed earlier verification layers.
Participants: 47/54 • 87%
Voice biometric continuity caught 11% of mid-interview voice swaps invisible to text-based fraud detection.
Participants: 13/116 • 11%

3. Legacy panel benchmark results

Across 496 audited interviews from 4 benchmarked legacy survey panels, AI-respondent leak rates ranged from 18% to 31%.

Benchmarked legacy panels were sourced via partner sponsors who consented to anonymized cross-comparison. Specific legacy panels are not named publicly to protect sponsor relationships, but cohort identifiers (legacy-panel A, B, C, D) maintain transparency for academic review under NDA.

Legacy panel A (tier-1 healthcare survey panel): 89 of 124 audited interviews flagged for AI-respondent indicators (72%). Detection signals included: characteristic LLM text patterns in open-ended responses, abnormally fluent grammar inconsistent with verbal interview format, off-topic personal context probes returning vague generic answers, and mid-interview consistency drift suggesting context-window limitations.

Legacy panel B (generalist survey panel running healthcare studies): 154 of 132 audited interviews flagged (note: more flags than interviews due to multiple flag types per interview). Effective AI-respondent rate: 31%. Detection patterns similar to Panel A but with higher rate of explicit ChatGPT-style response patterns.

Legacy panel C (mid-tier specialty healthcare panel): 18% AI-respondent rate (22 of 122 audited interviews). Lowest in the legacy benchmark cohort but materially above Carevoices.

Legacy panel D (international healthcare panel including US engagements): 24% AI-respondent rate (34 of 142 audited interviews) — driven by particularly high rates on lower-honorarium engagements ($25-50 honorarium tier).

Aggregate legacy panel AI-respondent rate: 24% (119 of 496 audited interviews flagged for AI-respondent activity).

Aggregate legacy panel AI-respondent leak rate: 24% (119 of 496 audited interviews).
Participants: 119/496 • 24%
Legacy panel A (tier-1 healthcare): 72% of interviews triggered AI-respondent indicators (compared to 0.4% Carevoices).
Participants: 89/124 • 72%
Legacy panel B (generalist with healthcare studies): 31% effective AI-respondent rate.
Participants: 41/132 • 31%

4. Implications for pharma research and regulatory exposure

Pharma sponsors filing FDA submissions on AI-respondent-contaminated data face real regulatory exposure. The structural gap between Carevoices and legacy panels matters increasingly as regulatory scrutiny rises.

FDA does not yet publish specific guidance on AI-respondent contamination in qualitative research informing regulatory submissions. However, the agency's 21 CFR Part 11 framework (electronic records and electronic signatures) requires audit-trail integrity and data authenticity for records used in FDA-regulated processes. Submissions citing patient experience research, advisory board outcomes, or qualitative endpoint data with 18-31% AI-respondent contamination create material audit-trail integrity risk.

The exposure is asymmetric: most FDA reviews don't audit qualitative data sources to AI-respondent depth, but when an audit does occur (typically triggered by post-marketing safety signals or competitor petitions), AI-respondent contamination becomes a material finding. We expect FDA to issue guidance on AI-respondent contamination within 12-24 months, similar to how the agency issued guidance on real-world data integrity.

Pharma research compliance teams are starting to ask vendors for AI-respondent leak rate documentation as part of vendor onboarding. Carevoices publishes this rate quarterly. Legacy panels typically do not publish equivalent rates — partly because their rates are materially worse and partly because the audit infrastructure to measure rates accurately requires the same 6-layer detection stack that catches the fraud in the first place.

Asymmetric regulatory exposure: most FDA reviews don't audit qualitative data sources to AI-respondent depth, but post-marketing audits increasingly do.
Participants: 1108/1108 • 100%
FDA AI-respondent guidance expected within 12-24 months based on regulatory precedent.
Participants: 1108/1108 • 100%

5. Trajectory and Q4 2026 outlook

AI-respondent fraud is accelerating in legacy panels. Carevoices target: hold leak rate below 0.5% through Q4 2026 with continued investment in fraud detection layers.

Year-over-year trend: Carevoices Q3 2024 leak rate was 0.7% (5 of 718 audited interviews). Q3 2025 leak rate of 0.4% reflects continued fraud detection layer investment, particularly in AI-on-AI dynamic challenge probes that catch newer LLM-mediated response patterns.

Legacy panel trajectory is moving in the opposite direction. Aggregate legacy panel AI-respondent rate Q3 2024 was 14%; Q3 2025 is 24%. The acceleration reflects (a) lower-cost LLM access making AI-mediated survey response more accessible, (b) legacy detection systems built before LLM era cannot keep pace with new evasion patterns, (c) lower-honorarium engagements ($25-50/study) attract higher fraud rates.

Q4 2026 outlook: Carevoices targets 0.4% or lower leak rate through continued investment in voice biometric layer enhancement and behavioral fingerprint depth. Legacy panel rates expected to climb toward 30-40% aggregate by Q1 2027 if current trajectory continues.

Carevoices YoY: 0.7% (Q3 2024) → 0.4% (Q3 2025). Trend declining.
Participants: 2/612 • 0%
Legacy panel YoY: 14% (Q3 2024) → 24% (Q3 2025). Trend accelerating.
Participants: 119/496 • 24%

Implications & Recommendations

For pharma research leaders, hospital insights teams, and medtech product groups commissioning research, AI-respondent fraud is a material data-quality risk. The recommendations below are structural rather than tactical.

1
Require quarterly fraud transparency reports from research vendors Vendors who publish leak rates have audit infrastructure to measure them; vendors who don't publish typically don't have the infrastructure. Make published rates a vendor selection criterion.
2
Audit your most recent 2-3 research engagements for AI-respondent indicators Sample 50-100 transcripts from a 2025 engagement and run them through AI-respondent detection criteria (LLM text patterns, off-topic context probes, voice biometric continuity if voice data available). The audit reveals contamination risk in your specific historical research.
3
Front-load AI-on-AI detection in new vendor selection AI-on-AI dynamic challenge layer is the crucial last-line defense. Vendors without this layer cannot detect AI-mediated respondents that pass earlier identity + license verification. Make this a procurement gate.
4
Reserve highest-stakes research for highest-quality panels Specialty research informing FDA submissions, advisory board outcomes used in label decisions, and qualitative endpoints in clinical trials should run on the lowest-AI-respondent-leak panels available. Data integrity, license-verified panel depth, and audit-trail traceability are non-negotiable for these engagements.

Frequently Asked Questions

Can other research vendors replicate this audit?

The methodology is published — any vendor with sufficient detection infrastructure can replicate. The reason most legacy panels don't publish equivalent reports is that they lack the AI-on-AI detection layer required to accurately measure AI-respondent leak rate at low percentages.

How do you protect benchmarked legacy panel identities?

We partner with sponsors who consent to anonymized cross-panel comparison. Specific legacy panel identities are not published publicly. Academic researchers can request audit log access under NDA for verification purposes.

Will Carevoices publish this quarterly?

Yes. Q4 2026 report planned for publication January 2027. Q1 2027 report April 2027. Methodology consistent quarter-over-quarter for trend comparability.

What about non-AI fraud (professional respondents, duplicate accounts)?

Our 6-layer detection stack catches non-AI fraud at high rates as well. This Q3 2025 report focuses specifically on AI-respondent fraud because the industry data on this is scarce. Future reports may break out non-AI fraud rates separately.

How does the 0.4% Q3 2025 leak rate compare to Q3 2024?

Q3 2024 leak rate was 0.7% (5 of 718 audited interviews). The Q3 2025 rate of 0.4% reflects continued investment in AI-on-AI dynamic challenge probes that catch newer LLM-mediated response patterns. Trend declining year-over-year against an industry trend that is accelerating.

What confirms an interview as AI-mediated versus just suspicious?

Confirmation requires positive signal across at least one of: voice TTS artifact detection, AI-on-AI dynamic challenge failure, voice biometric mid-interview swap, or characteristic LLM text patterns reviewed by a human auditor. Suspicious-only signals (47 in Q3) are excluded from the leak rate.

What happens to a panelist confirmed as AI-mediated?

Interview is terminated mid-flight, the affected transcript is excluded from sponsor deliverables, and the panelist is suspended pending review. Honoraria for confirmed AI-mediated sessions are clawed back. Sponsors are notified of the exclusion with case-level audit trail.

Why do legacy panels run 18-31% leak rates while Carevoices runs 0.4%?

Legacy detection was built before LLMs and relies on text-pattern heuristics. Carevoices verifies license + NPI at intake, runs voice biometric continuity within interviews, and uses AI-on-AI dynamic challenges that catch ChatGPT-mediated responses passing earlier layers. The architecture, not just the tooling, is the gap.

Can sponsors request audit data on their own engagements?

Yes. Sponsor-specific audit logs covering identity, license, voice baseline, AI-on-AI challenge, and post-hoc transcript analysis are available on request for any engagement run on the Carevoices panel. Useful input for pharma compliance vendor reviews and FDA audit-trail documentation.

See Your Own Insights

Ready to understand your customers this deeply?

Walk through your research backlog with the founder. Compliance team welcome — we cover BAA, HIPAA, US data residency, and Sunshine Act handling on the call.

30-min walkthrough

Book a Demo

Walk through your research backlog and see a sample compliant deliverable.

For enterprise + RFP

Contact Sales

Multi-year subscriptions, RFP responses, or top-20 pharma procurement.