← Reference Deep-Dives Reference Deep-Dive · 6 min read

AI-on-AI Fraud Detection: Catching LLM-Mediated Respondents

By Kevin, Founder & CEO

Why does AI-on-AI fraud detection matter for healthcare research panels?

Legacy panel fraud detection was built for the pre-LLM era. The detection methodology — checking response timing, vocabulary fluency, screener consistency, IP address fingerprinting — assumed humans gave imperfect responses and bot-generated responses had detectable patterns (broken grammar, repetitive structures, clear time-stamp anomalies).

Large language models broke the methodology. LLM-mediated responses give answers that are too good for legacy detection: well-formed grammar, on-topic content, internally consistent positions, plausible response timing. The Proceedings of the National Academy of Sciences published research in 2025 demonstrating AI bots evade legacy survey panel fraud detection 99.8% of the time. The detection tools designed to catch obvious fraud cannot keep pace with sophisticated AI mediation.

The structural problem requires a structural solution: AI-on-AI Detection. Only an AI moderator with conversational reasoning capability can deploy the probes that expose LLM-mediated respondents. Legacy text-based survey fraud detection cannot match the methodology because it lacks the conversational AI capability needed.

Carevoices’ Q3 2025 Panel Fraud Transparency Report quantified the gap: 0.4% AI-respondent leak rate on the Carevoices panel using the AI-on-AI Detection Layer (2 of 612 audited interviews) versus 18-31% on benchmarked legacy survey panels (119 of 496 audited interviews).

The four primary detection methods

1. Dynamic challenges

The AI moderator introduces unexpected response constraints mid-interview that LLMs handle poorly because they require deviation from the LLM’s preferred response generation patterns. Examples:

  • Word-injection challenges — “Repeat that sentence with the word ‘mango’ in it.” LLM-mediated respondents typically continue with their generated answer or paraphrase awkwardly because the injection constraint disrupts response generation. Human respondents either successfully inject the word or laugh at the strange request.

  • Format reversal challenges — “Tell me that backwards” or “List the same points in reverse order.” LLMs handle these inconsistently; human respondents typically engage with the meta-task or politely decline.

  • Specificity challenges — “Give me a 5-word version of that answer.” LLMs trained on verbose explanations struggle with extreme compression; human respondents either compress confidently or push back on the constraint.

The challenges are calibrated to feel conversationally appropriate (not interrogative) so legitimate respondents engage naturally while LLM-mediated responses produce detectable inconsistencies.

2. Off-topic personal context probes

LLMs cannot fabricate personal context consistently because the context-window structure breaks down across multi-turn interactions when the participant lacks real lived experience. The AI moderator inserts conversational probes that require personal context:

  • Temporal probes — “What did you have for breakfast today?” / “What’s the weather like where you are?” — LLM-mediated respondents either generate plausible-but-generic responses (no specific details) or contradict earlier responses across the interview.

  • Workspace probes — “Where in your office are you taking this call?” / “What’s hanging on the wall behind you?” — Voice/video modality makes these probes more powerful; LLM-mediated respondents cannot cross-validate against the visible environment.

  • Memory probes — “When you said earlier that [X], you mentioned [Y]. Can you walk me through that again?” — LLM context-window limitations cause cross-reference inconsistencies that human respondents handle naturally.

The probes are designed to feel like normal conversational interest while exposing LLM context limitations.

3. Real-time TTS artifact detection

For voice modality interviews, real-time text-to-speech (TTS) artifact detection examines audio for signatures of synthesized voice generation. The AI moderator analyzes:

  • Spectral patterns characteristic of TTS voice synthesis (subtle frequency artifacts that elude human perception but show in spectral analysis)
  • Prosody patterns that differ from natural human speech (TTS often has slightly mechanical pitch contours)
  • Breath and pause patterns that natural conversation includes but TTS frequently omits or generates artificially

The detection runs continuously during the interview. Voice biometric continuity (a parallel layer) catches mid-interview voice swaps that pure TTS detection might miss.

4. Conversational coherence drift analysis

Across multi-turn interviews, LLM-mediated respondents exhibit coherence drift patterns that differ from human respondents. The AI moderator analyzes:

  • Position consistency — Does the respondent maintain the same position across the interview, or drift in ways characteristic of LLM context-window pressure?
  • Specificity gradient — Do answers get more or less specific over the interview? LLMs trained on broad text often produce decreasing specificity as context window saturates.
  • Cross-question contradiction — Are answers in question 8 consistent with answers in question 3? Humans hold positions; LLMs sometimes contradict earlier answers when context-window pressure rises.
  • Personal pronoun usage — Are first-person references consistent and authentic? LLM-mediated responses often use first-person pronouns artificially or inconsistently.

The drift analysis runs post-interview and triggers manual review for flagged interviews.

What happens when AI-on-AI detection flags a respondent?

Flagging is silent during the interview — the respondent does not know they’ve been flagged. The interview continues to natural conclusion (or gracefully terminates if the AI moderator’s confidence threshold is exceeded mid-interview).

Post-interview, flagged sessions route to manual review:

  • Confirmed AI-mediated: Permanent ban from the Carevoices panel; payment refused; affected transcripts excluded from sponsor deliverables; sponsor notified of contamination scope (without disclosing specific participant identity to maintain panel privacy norms).
  • Suspected but not confirmed: Additional verification required on subsequent engagements; payment processed for the current interview but flagged for monitoring.
  • False positive: Sometimes legitimate participants trigger one or more flags due to environmental factors, language patterns, or unusual response styles. Manual review distinguishes false positives from confirmed AI-mediated activity.

Q3 2025 audit data: 47 of 612 Carevoices interviews triggered earlier-layer fraud signals (identity verification challenges, license verification anomalies, voice baseline mismatches) but were resolved as legitimate panelist activity after manual review. These false positives did NOT contribute to the 0.4% leak rate metric — only confirmed AI-mediated activity counts.

How does AI-on-AI detection fit the broader 6-layer fraud detection stack?

The AI-on-AI Detection Layer is the fourth of six layers in The 6-Layer Fraud Detection Stack that Carevoices deploys at panel intake and across every interview:

  1. Identity at intake (KYC-grade verification — Persona / Stripe Identity / Onfido)
  2. License verification (state board database cross-check, NPI Registry validation)
  3. Voice baseline + biometric continuity (10-second voice baseline at intake; continuity check across interviews)
  4. AI-on-AI Detection Layer (this guide)
  5. Behavioral fingerprint (pattern analysis across panelist’s interview history)
  6. Payment infrastructure (Stripe Connect / Plaid bank verification ties identity to real money)

The layers compound: most fraudulent intent fails at Layer 1 (KYC). Sophisticated LLM-mediated respondents that pass Layers 1-3 typically fail Layer 4. Voice swaps that pass Layers 1-4 fail Layer 3 (biometric continuity). The redundancy is intentional — no single layer is sufficient against the full attack surface.

What this means for healthcare research buyers

Pharma research compliance teams evaluating research vendors should ask specifically about AI-on-AI Detection capability. Vendors that rely on legacy text-based fraud detection structurally cannot match the methodology — they lack the conversational AI capability needed to deploy the probes.

Specific evaluation questions:

  • Does the vendor’s AI moderator deploy dynamic challenge probes mid-interview? If not, AI-on-AI Detection isn’t built.
  • Does the vendor publish quarterly fraud transparency reports with verified AI-respondent leak rates? Vendors that don’t publish typically don’t have the audit infrastructure to measure them — which means they can’t manage what they can’t measure.
  • Does the vendor run voice biometric continuity in addition to text-based detection? Voice modality fraud (mid-interview voice swaps) requires biometric layer to catch.
  • What’s the per-quarter trend on the vendor’s AI-respondent leak rate? Carevoices’ YoY trend went from 0.7% (Q3 2024) to 0.4% (Q3 2025) — declining. Legacy panel trajectories are accelerating in the opposite direction (14% → 24% YoY).

The AI-on-AI Detection methodology is the differentiator that purpose-built healthcare research vendors invest in and that legacy survey panels cannot retrofit without architectural rebuild. For research informing FDA submissions, hospital workforce decisions, or medtech product roadmap, the methodology investment is the difference between defensible data and contaminated data.


This guide is a deeper read of the AI-on-AI Detection Layer in Carevoices’ Panel Fraud Transparency Report Q3 2025.

Frequently Asked Questions

The detection methodology requires AI capability — only an AI moderator can deploy dynamic challenges, parse TTS artifacts in real time, and analyze conversational coherence at the depth required. Legacy text-based survey fraud detection cannot match the methodology because it doesn't have the conversational AI capability needed to deploy the probes.
Some methods will be evaded over time as LLMs improve. The detection layer is updated quarterly with new probes informed by adversarial testing. Voice biometric continuity (a parallel layer) catches voice-cloning attempts that text-based detection misses regardless of LLM capability.
No — flagging is silent during the interview. Flagged interviews are routed to manual review post-fielding. Confirmed AI-mediated participants are quietly removed from the panel; suspected but not confirmed cases get additional verification on subsequent engagements.
Get Started

Put This Research Into Action

Run your first 3 AI-moderated customer interviews free — no credit card, no sales call.

30-min walkthrough

Walk through your research backlog and see a sample compliant deliverable.

For enterprise + RFP

Multi-year subscriptions, RFP responses, or top-20 pharma procurement.