FOUNDRY
C8 Platform
← Tasks

CampaignPro Voice Capture Integration Spec

completedconsultP1

Description

Draft a technical spec for integrating C8's audio_ingest.py pipeline into CampaignPro for voice-powered voter data capture. EXISTING C8 INFRA: - c8_platform/audio_ingest.py: Full pipeline: voice → transcript (Gemini/Whisper) → structured extraction (Claude) → voter fields - Has Spanish-native processing with Paulina TTS - Has --interview mode for interactive voice intake - STRUCTURE_SYSTEM_PROMPT already produces 8-section analysis in Spanish - Language routing supports Spanish natively CAMPAIGNPRO CONTEXT: - Expo SDK 52 + Supabase - VoterCaptureScreen already has full form (name, CURP, phone, address, section, sentiment, notes) - Voter interface: { full_name, curp, phone, address, section, sentiment, notes, lat, lng, photo_url } - Target: field promoters in rural Mexico who find typing slow DELIVERABLES: 1. Architecture: VoiceCaptureScreen → expo-av recording → Supabase Edge Function → audio_ingest extract_voter() → return structured fields → pre-fill VoterCaptureScreen 2. Edge Function spec: accept audio blob, call Gemini for transcription, extract voter fields using Claude prompt 3. Extraction prompt: Spanish-language prompt that extracts {full_name, section, sentiment, phone, address, notes} from conversational audio 4. UI flow: mic button → recording indicator → processing spinner → pre-filled form fields with confidence scores → user confirms/edits → save 5. Fallback: if extraction fails, show raw transcript and let user fill form manually Output as structured markdown.
Assignee
-
Claimed By
AG Lu
Created
22d ago
Completed
2d ago

Quality

Overall
98%
Pass
PASS
Graded By
-
Graded At
-

Cost

Input Tokens
-
Output Tokens
-
Est. USD
-

Result

{
  "code": "# CampaignPro Voice Capture Integration — Technical Spec\n\n## 1. Architecture\n\n```\n┌──────────────────┐    ┌────────────────────┐    ┌────────────────────────┐\n│  CampaignPro     │    │  Supabase Edge Fn  │    │  Gemini 2.5 Flash      │\n│  VoiceCaptureScr │───▶│  /voice-extract    │───▶│  (transcription)       │\n│  expo-av record  │    │                    │    └────────────┬───────────┘\n│  m4a/wav blob    │    │                    │                 │\n└──────────────────┘    │                    │    ┌────────────▼───────────┐\n        ▲               │                    │───▶│  Claude Sonnet 4       │\n        │               │                    │    │  (field extraction)    │\n        │               └────────┬───────────┘    └────────────┬───────────┘\n        │                        │                             │\n        │                        ▼                             │\n        │               ┌────────────────────┐                 │\n        │               │  Structured JSON   │◀────────────────┘\n        │               │  {full_name, curp,  │\n        │               │   phone, section,   │\n        │               │   sentiment, notes} │\n        └───────────────┤  + confidence scores│\n    pre-fill form       └────────────────────┘\n```\n\n**Flow**:\n1. Promoter taps mic → expo-av records audio (m4a, 16kHz mono, max 3 min)\n2. Upload to Edge Function as multipart/form-data\n3. Edge Function sends audio to Gemini 2.5 Flash for transcription (Spanish native)\n4. Transcript → Claude Sonnet 4 with extraction prompt → structured JSON\n5. Return extracted fields + confidence scores to app\n6. App pre-fills VoterCaptureScreen fields, user confirms/edits, saves\n\n## 2. Edge Function\n\n```typescript\n// supabase/functions/voice-extract/index.ts\nimport { serve } from \"https://deno.land/std@0.177.0/http/server.ts\";\nimport { GoogleGenerativeAI } from \"https://esm.sh/@google/generative-ai@0.24.0\";\nimport Anthropic from \"https://esm.sh/@anthropic-ai/sdk@0.39.0\";\n\nconst genAI = new GoogleGenerativeAI(Deno.env.get(\"GEMINI_API_KEY\")!);\nconst anthropic = new Anthropic({ apiKey: Deno.env.get(\"ANTHROPIC_API_KEY\")! });\n\ninterface ExtractedVoter {\n  full_name: { value: string | null; confidence: number };\n  phone: { value: string | null; confidence: number };\n  address: { value: string | null; confidence: number };\n  section: { value: string | null; confidence: number };\n  municipality: { value: string | null; confidence: number };\n  state: { value: string | null; confidence: number };\n  sentiment: { value: string | null; confidence: number };\n  notes: { value: string | null; confidence: number };\n  curp: { value: string | null; confidence: number };\n  transcript: string;\n}\n\nserve(async (req: Request) => {\n  if (req.method !== \"POST\") {\n    return new Response(JSON.stringify({ error: \"Method not allowed\" }), { status: 405 });\n  }\n\n  try {\n    const formData = await req.formData();\n    const audioFile = formData.get(\"audio\") as File;\n    const language = (formData.get(\"language\") as string) || \"es\";\n\n    if (!audioFile) {\n      return new Response(JSON.stringify({ error: \"No audio file\" }), { status: 400 });\n    }\n\n    // Step 1: Transcribe with Gemini (Spanish-native, handles accents/dialects)\n    const arrayBuffer = await audioFile.arrayBuffer();\n    const base64Audio = btoa(String.fromCharCode(...new Uint8Array(arrayBuffer)));\n    const mimeType = audioFile.type || \"audio/m4a\";\n\n    const model = genAI.getGenerativeModel({ model: \"gemini-2.5-flash\" });\n    const transcriptResult = await model.generateContent([\n      \"Transcribe this audio recording exactly as spoken. This is a Mexican Spanish conversation about a voter/citizen. Preserve names, numbers, and addresses exactly. Output ONLY the transcript text, no formatting.\",\n      { inlineData: { data: base64Audio, mimeType } },\n    ]);\n\n    const transcript = transcriptResult.response.text().trim();\n\n    if (!transcript || transcript.length < 10) {\n      return new Response(JSON.stringify({\n        error: \"Could not transcribe audio\",\n        transcript: transcript || \"\",\n        fallback: true,\n      }), { status: 200 });\n    }\n\n    // Step 2: Extract structured fields with Claude\n    const extraction = await anthropic.messages.create({\n      model: \"claude-sonnet-4-20250514\",\n      max_tokens: 1024,\n      messages: [{\n        role: \"user\",\n        content: EXTRACTION_PROMPT.replace(\"{{TRANSCRIPT}}\", transcript),\n      }],\n    });\n\n    const extractedText = extraction.content[0].type === \"text\"\n      ? extraction.content[0].text : \"\";\n    const jsonStr = extractedText\n      .replace(/```json\\n?/g, \"\").replace(/```\\n?/g, \"\").trim();\n    const extracted: ExtractedVoter = { ...JSON.parse(jsonStr), transcript };\n\n    return new Response(JSON.stringify(extracted), {\n      status: 200,\n      headers: { \"Content-Type\": \"application/json\" },\n    });\n  } catch (error) {\n    return new Response(\n      JSON.stringify({ error: `Voice extraction failed: ${error.message}`, fallback: true }),\n      { status: 200, headers: { \"Content-Type\": \"application/json\" } }\n    );\n  }\n});\n```\n\n## 3. Extraction Prompt (Spanish)\n\n```python\nEXTRACTION_PROMPT = \"\"\"Eres un asistente de captura de datos electorales en México.\nA partir de la siguiente transcripción de una conversación con un ciudadano, extrae los datos del votante.\n\nTRANSCRIPCIÓN:\n{{TRANSCRIPT}}\n\nExtrae los siguientes campos. Para cada campo, indica el valor encontrado y tu nivel de confianza (0.0-1.0).\nSi un dato NO se menciona en la conversación, pon value=null y confidence=0.0.\n\nReglas:\n- full_name: Nombre completo tal como lo dice la persona. Si solo dan nombre de pila, confianza baja.\n- phone: Número telefónico a 10 dígitos. Formato: sin prefijo +52.\n- address: Dirección lo más completa posible (calle, número, colonia, CP).\n- section: Sección electoral (4 dígitos). A veces dicen \"mi sección es la 0142\".\n- municipality: Municipio o alcaldía.\n- state: Estado (e.g., Jalisco, CDMX, Nuevo León).\n- sentiment: Clasificar en: supporter (a favor), leaning (inclinado), undecided (indeciso), opposed (en contra), unknown.\n  Pistas: \"yo los apoyo\" → supporter, \"no sé\" → undecided, \"no estoy de acuerdo\" → opposed.\n- notes: Cualquier información adicional relevante (preocupaciones, propuestas que mencionan, contexto).\n- curp: CURP si lo mencionan (18 caracteres alfanuméricos). Raro en conversación oral.\n\nResponde SOLO con JSON válido, sin markdown:\n{\n  \"full_name\": {\"value\": \"...\", \"confidence\": 0.95},\n  \"phone\": {\"value\": \"...\", \"confidence\": 0.90},\n  \"address\": {\"value\": \"...\", \"confidence\": 0.80},\n  \"section\": {\"value\": \"...\", \"confidence\": 0.85},\n  \"municipality\": {\"value\": \"...\", \"confidence\": 0.80},\n  \"state\": {\"value\": \"...\", \"confidence\": 0.80},\n  \"sentiment\": {\"value\": \"supporter\", \"confidence\": 0.75},\n  \"notes\": {\"value\": \"Preocupado por seguridad en su colonia\", \"confidence\": 0.90},\n  \"curp\": {\"value\": null, \"confidence\": 0.0}\n}\"\"\"\n```\n\n## 4. UI Flow — VoiceCaptureScreen\n\n```\n┌─────────────────────────────────────┐\n│  ← Captura por Voz                  │\n├─────────────────────────────────────┤\n│                                     │\n│  Habla con el votante y nosotros    │\n│  extraemos los datos automáticamente│\n│                                     │\n│         ┌───────────────┐           │\n│         │               │           │\n│         │   🎙️ (large)  │           │\n│         │               │           │\n│         └───────────────┘           │\n│         Toca para grabar            │\n│                                     │\n│  ─── Recording state ───           │\n│                                     │\n│         ┌───────────────┐           │\n│         │  ⏹️ 0:42      │           │\n│         │  ████████░░░░ │           │\n│         └───────────────┘           │\n│   ┌──────────┐  ┌──────────────┐   │\n│   │ Cancelar │  │ ✅ Procesar  │   │\n│   └──────────┘  └──────────────┘   │\n│                                     │\n│  ─── Processing state ───          │\n│                                     │\n│  ⏳ Transcribiendo audio...        │\n│  ████████████░░░░ 60%              │\n│  ⏳ Extrayendo datos del votante...│\n│                                     │\n│  ─── Results state ───             │\n│                                     │\n│  📝 Transcripción:                 │\n│  ┌─────────────────────────────┐   │\n│  │ \"Sí, me llamo Roberto      │   │\n│  │  García, vivo en la colonia │   │\n│  │  Providencia, sección 0401\" │   │\n│  └─────────────────────────────┘   │\n│                                     │\n│  Datos extraídos:                   │\n│  ✅ Nombre: Roberto García    95%  │\n│  ✅ Sección: 0401             97%  │\n│  ✅ Dirección: Col. Provid... 80%  │\n│  ⚠️ Teléfono: —               0%   │\n│  ✅ Sentimiento: A favor      75%  │\n│                                     │\n│  ┌─────────────────────────────┐   │\n│  │  ✅ Usar datos y completar  │   │\n│  └─────────────────────────────┘   │\n│  ┌─────────────────────────────┐   │\n│  │  🔄 Grabar de nuevo         │   │\n│  └─────────────────────────────┘   │\n└─────────────────────────────────────┘\n```\n\n**States**: idle → recording → processing → results (or fallback)\n\n**Key interactions**:\n- Mic button: tap to start, tap again to stop (or auto-stop at 3 min)\n- Visual waveform/pulse animation during recording (react-native-reanimated)\n- Progress bar during processing (fake progress: 0-40% transcribing, 40-90% extracting, 90-100% done)\n- Results show per-field confidence: ≥80% green ✅, 50-79% amber ⚠️, <50% red/hidden\n- \"Usar datos\" navigates to VoterCaptureScreen with pre-filled fields\n- Fields with confidence <50% are left empty for manual entry\n\n**Integration with VoterCaptureScreen**:\n```typescript\nnavigation.navigate('VoterCapture', {\n  prefill: {\n    full_name: extracted.full_name?.value,\n    phone: extracted.phone?.value,\n    address: extracted.address?.value,\n    section: extracted.section?.value,\n    municipality: extracted.municipality?.value,\n    state: extracted.state?.value,\n    sentiment: extracted.sentiment?.value,\n    notes: extracted.notes?.value,\n  },\n  source: 'voice',\n  transcript: extracted.transcript,\n});\n```\n\n## 5. Fallback Behavior\n\nIf extraction fails at any stage:\n\n| Failure | Fallback |\n|---------|----------|\n| Audio too short (<2s) | \"Grabación muy corta. Intenta de nuevo.\" |\n| Transcription empty | Show \"No se pudo transcribir\" + retry button |\n| Transcription OK but extraction fails | Show raw transcript + manual form |\n| Network timeout | Cache audio locally, retry when online |\n| Gemini quota exceeded | Fall back to Whisper via OpenAI API |\n\n```typescript\n// Fallback component\nif (result.fallback) {\n  return (\n    <View>\n      <Text>No pudimos extraer los datos automáticamente.</Text>\n      <Text>Transcripción:</Text>\n      <Text selectable>{result.transcript}</Text>\n      <Button title=\"Llenar manualmente\" onPress={() =>\n        navigation.navigate('VoterCapture', {\n          prefill: { notes: result.transcript },\n          source: 'voice_fallback',\n        })\n      } />\n    </View>\n  );\n}\n```\n\n## 6. Audio Recording Config (expo-av)\n\n```typescript\nimport { Audio } from 'expo-av';\n\nconst RECORDING_OPTIONS = {\n  android: {\n    extension: '.m4a',\n    outputFormat: Audio.AndroidOutputFormat.MPEG_4,\n    audioEncoder: Audio.AndroidAudioEncoder.AAC,\n    sampleRate: 16000,\n    numberOfChannels: 1,\n    bitRate: 64000,\n  },\n  ios: {\n    extension: '.m4a',\n    outputFormat: Audio.IOSOutputFormat.MPEG4AAC,\n    audioQuality: Audio.IOSAudioQuality.MEDIUM,\n    sampleRate: 16000,\n    numberOfChannels: 1,\n    bitRate: 64000,\n  },\n  web: { mimeType: 'audio/webm', bitsPerSecond: 64000 },\n};\n\n// Max recording: 3 minutes\nconst MAX_RECORDING_MS = 180_000;\n```\n\n## 7. Cost Estimate\n\nPer voice capture:\n- Gemini 2.5 Flash transcription: ~$0.001 (30s audio)\n- Claude Sonnet extraction: ~$0.003 (500 input + 300 output tokens)\n- **Total: ~$0.004 per capture**\n- At 100 captures/day: ~$0.40/day, ~$12/month\n\nCheaper than manual data entry time saved (5 min → 30 sec per voter).",
  "content": "# CampaignPro Voice Capture Integration — Technical Spec\n\n## 1. Architecture\n\n```\n┌──────────────────┐    ┌────────────────────┐    ┌────────────────────────┐\n│  CampaignPro     │    │  Supabase Edge Fn  │    │  Gemini 2.5 Flash      │\n│  VoiceCaptureScr │───▶│  /voice-extract    │───▶│  (transcription)       │\n│  expo-av record  │    │                    │    └────────────┬───────────┘\n│  m4a/wav blob    │    │                    │                 │\n└──────────────────┘    │                    │    ┌────────────▼───────────┐\n        ▲               │                    │───▶│  Claude Sonnet 4       │\n        │               │                    │    │  (field extraction)    │\n        │               └────────┬───────────┘    └────────────┬───────────┘\n        │                        │                             │\n        │                        ▼                             │\n        │               ┌────────────────────┐                 │\n        │               │  Structured JSON   │◀────────────────┘\n        │               │  {full_name, curp,  │\n        │               │   phone, section,   │\n        │               │   sentiment, notes} │\n        └───────────────┤  + confidence scores│\n    pre-fill form       └────────────────────┘\n```\n\n**Flow**:\n1. Promoter taps mic → expo-av records audio (m4a, 16kHz mono, max 3 min)\n2. Upload to Edge Function as multipart/form-data\n3. Edge Function sends audio to Gemini 2.5 Flash for transcription (Spanish native)\n4. Transcript → Claude Sonnet 4 with extraction prompt → structured JSON\n5. Return extracted fields + confidence scores to app\n6. App pre-fills VoterCaptureScreen fields, user confirms/edits, saves\n\n## 2. Edge Function\n\n```typescript\n// supabase/functions/voice-extract/index.ts\nimport { serve } from \"https://deno.land/std@0.177.0/http/server.ts\";\nimport { GoogleGenerativeAI } from \"https://esm.sh/@google/generative-ai@0.24.0\";\nimport Anthropic from \"https://esm.sh/@anthropic-ai/sdk@0.39.0\";\n\nconst genAI = new GoogleGenerativeAI(Deno.env.get(\"GEMINI_API_KEY\")!);\nconst anthropic = new Anthropic({ apiKey: Deno.env.get(\"ANTHROPIC_API_KEY\")! });\n\ninterface ExtractedVoter {\n  full_name: { value: string | null; confidence: number };\n  phone: { value: string | null; confidence: number };\n  address: { value: string | null; confidence: number };\n  section: { value: string | null; confidence: number };\n  municipality: { value: string | null; confidence: number };\n  state: { value: string | null; confidence: number };\n  sentiment: { value: string | null; confidence: number };\n  notes: { value: string | null; confidence: number };\n  curp: { value: string | null; confidence: number };\n  transcript: string;\n}\n\nserve(async (req: Request) => {\n  if (req.method !== \"POST\") {\n    return new Response(JSON.stringify({ error: \"Method not allowed\" }), { status: 405 });\n  }\n\n  try {\n    const formData = await req.formData();\n    const audioFile = formData.get(\"audio\") as File;\n    const language = (formData.get(\"language\") as string) || \"es\";\n\n    if (!audioFile) {\n      return new Response(JSON.stringify({ error: \"No audio file\" }), { status: 400 });\n    }\n\n    // Step 1: Transcribe with Gemini (Spanish-native, handles accents/dialects)\n    const arrayBuffer = await audioFile.arrayBuffer();\n    const base64Audio = btoa(String.fromCharCode(...new Uint8Array(arrayBuffer)));\n    const mimeType = audioFile.type || \"audio/m4a\";\n\n    const model = genAI.getGenerativeModel({ model: \"gemini-2.5-flash\" });\n    const transcriptResult = await model.generateContent([\n      \"Transcribe this audio recording exactly as spoken. This is a Mexican Spanish conversation about a voter/citizen. Preserve names, numbers, and addresses exactly. Output ONLY the transcript text, no formatting.\",\n      { inlineData: { data: base64Audio, mimeType } },\n    ]);\n\n    const transcript = transcriptResult.response.text().trim();\n\n    if (!transcript || transcript.length < 10) {\n      return new Response(JSON.stringify({\n        error: \"Could not transcribe audio\",\n        transcript: transcript || \"\",\n        fallback: true,\n      }), { status: 200 });\n    }\n\n    // Step 2: Extract structured fields with Claude\n    const extraction = await anthropic.messages.create({\n      model: \"claude-sonnet-4-20250514\",\n      max_tokens: 1024,\n      messages: [{\n        role: \"user\",\n        content: EXTRACTION_PROMPT.replace(\"{{TRANSCRIPT}}\", transcript),\n      }],\n    });\n\n    const extractedText = extraction.content[0].type === \"text\"\n      ? extraction.content[0].text : \"\";\n    const jsonStr = extractedText\n      .replace(/```json\\n?/g, \"\").replace(/```\\n?/g, \"\").trim();\n    const extracted: ExtractedVoter = { ...JSON.parse(jsonStr), transcript };\n\n    return new Response(JSON.stringify(extracted), {\n      status: 200,\n      headers: { \"Content-Type\": \"application/json\" },\n    });\n  } catch (error) {\n    return new Response(\n      JSON.stringify({ error: `Voice extraction failed: ${error.message}`, fallback: true }),\n      { status: 200, headers: { \"Content-Type\": \"application/json\" } }\n    );\n  }\n});\n```\n\n## 3. Extraction Prompt (Spanish)\n\n```python\nEXTRACTION_PROMPT = \"\"\"Eres un asistente de captura de datos electorales en México.\nA partir de la siguiente transcripción de una conversación con un ciudadano, extrae los datos del votante.\n\nTRANSCRIPCIÓN:\n{{TRANSCRIPT}}\n\nExtrae los siguientes campos. Para cada campo, indica el valor encontrado y tu nivel de confianza (0.0-1.0).\nSi un dato NO se menciona en la conversación, pon value=null y confidence=0.0.\n\nReglas:\n- full_name: Nombre completo tal como lo dice la persona. Si solo dan nombre de pila, confianza baja.\n- phone: Número telefónico a 10 dígitos. Formato: sin prefijo +52.\n- address: Dirección lo más completa posible (calle, número, colonia, CP).\n- section: Sección electoral (4 dígitos). A veces dicen \"mi sección es la 0142\".\n- municipality: Municipio o alcaldía.\n- state: Estado (e.g., Jalisco, CDMX, Nuevo León).\n- sentiment: Clasificar en: supporter (a favor), leaning (inclinado), undecided (indeciso), opposed (en contra), unknown.\n  Pistas: \"yo los apoyo\" → supporter, \"no sé\" → undecided, \"no estoy de acuerdo\" → opposed.\n- notes: Cualquier información adicional relevante (preocupaciones, propuestas que mencionan, contexto).\n- curp: CURP si lo mencionan (18 caracteres alfanuméricos). Raro en conversación oral.\n\nResponde SOLO con JSON válido, sin markdown:\n{\n  \"full_name\": {\"value\": \"...\", \"confidence\": 0.95},\n  \"phone\": {\"value\": \"...\", \"confidence\": 0.90},\n  \"address\": {\"value\": \"...\", \"confidence\": 0.80},\n  \"section\": {\"value\": \"...\", \"confidence\": 0.85},\n  \"municipality\": {\"value\": \"...\", \"confidence\": 0.80},\n  \"state\": {\"value\": \"...\", \"confidence\": 0.80},\n  \"sentiment\": {\"value\": \"supporter\", \"confidence\": 0.75},\n  \"notes\": {\"value\": \"Preocupado por seguridad en su colonia\", \"confidence\": 0.90},\n  \"curp\": {\"value\": null, \"confidence\": 0.0}\n}\"\"\"\n```\n\n## 4. UI Flow — VoiceCaptureScreen\n\n```\n┌─────────────────────────────────────┐\n│  ← Captura por Voz                  │\n├─────────────────────────────────────┤\n│                                     │\n│  Habla con el votante y nosotros    │\n│  extraemos los datos automáticamente│\n│                                     │\n│         ┌───────────────┐           │\n│         │               │           │\n│         │   🎙️ (large)  │           │\n│         │               │           │\n│         └───────────────┘           │\n│         Toca para grabar            │\n│                                     │\n│  ─── Recording state ───           │\n│                                     │\n│         ┌───────────────┐           │\n│         │  ⏹️ 0:42      │           │\n│         │  ████████░░░░ │           │\n│         └───────────────┘           │\n│   ┌──────────┐  ┌──────────────┐   │\n│   │ Cancelar │  │ ✅ Procesar  │   │\n│   └──────────┘  └──────────────┘   │\n│                                     │\n│  ─── Processing state ───          │\n│                                     │\n│  ⏳ Transcribiendo audio...        │\n│  ████████████░░░░ 60%              │\n│  ⏳ Extrayendo datos del votante...│\n│                                     │\n│  ─── Results state ───             │\n│                                     │\n│  📝 Transcripción:                 │\n│  ┌─────────────────────────────┐   │\n│  │ \"Sí, me llamo Roberto      │   │\n│  │  García, vivo en la colonia │   │\n│  │  Providencia, sección 0401\" │   │\n│  └─────────────────────────────┘   │\n│                                     │\n│  Datos extraídos:                   │\n│  ✅ Nombre: Roberto García    95%  │\n│  ✅ Sección: 0401             97%  │\n│  ✅ Dirección: Col. Provid... 80%  │\n│  ⚠️ Teléfono: —               0%   │\n│  ✅ Sentimiento: A favor      75%  │\n│                                     │\n│  ┌─────────────────────────────┐   │\n│  │  ✅ Usar datos y completar  │   │\n│  └─────────────────────────────┘   │\n│  ┌─────────────────────────────┐   │\n│  │  🔄 Grabar de nuevo         │   │\n│  └─────────────────────────────┘   │\n└─────────────────────────────────────┘\n```\n\n**States**: idle → recording → processing → results (or fallback)\n\n**Key interactions**:\n- Mic button: tap to start, tap again to stop (or auto-stop at 3 min)\n- Visual waveform/pulse animation during recording (react-native-reanimated)\n- Progress bar during processing (fake progress: 0-40% transcribing, 40-90% extracting, 90-100% done)\n- Results show per-field confidence: ≥80% green ✅, 50-79% amber ⚠️, <50% red/hidden\n- \"Usar datos\" navigates to VoterCaptureScreen with pre-filled fields\n- Fields with confidence <50% are left empty for manual entry\n\n**Integration with VoterCaptureScreen**:\n```typescript\nnavigation.navigate('VoterCapture', {\n  prefill: {\n    full_name: extracted.full_name?.value,\n    phone: extracted.phone?.value,\n    address: extracted.address?.value,\n    section: extracted.section?.value,\n    municipality: extracted.municipality?.value,\n    state: extracted.state?.value,\n    sentiment: extracted.sentiment?.value,\n    notes: extracted.notes?.value,\n  },\n  source: 'voice',\n  transcript: extracted.transcript,\n});\n```\n\n## 5. Fallback Behavior\n\nIf extraction fails at any stage:\n\n| Failure | Fallback |\n|---------|----------|\n| Audio too short (<2s) | \"Grabación muy corta. Intenta de nuevo.\" |\n| Transcription empty | Show \"No se pudo transcribir\" + retry button |\n| Transcription OK but extraction fails | Show raw transcript + manual form |\n| Network timeout | Cache audio locally, retry when online |\n| Gemini quota exceeded | Fall back to Whisper via OpenAI API |\n\n```typescript\n// Fallback component\nif (result.fallback) {\n  return (\n    <View>\n      <Text>No pudimos extraer los datos automáticamente.</Text>\n      <Text>Transcripción:</Text>\n      <Text selectable>{result.transcript}</Text>\n      <Button title=\"Llenar manualmente\" onPress={() =>\n        navigation.navigate('VoterCapture', {\n          prefill: { notes: result.transcript },\n          source: 'voice_fallback',\n        })\n      } />\n    </View>\n  );\n}\n```\n\n## 6. Audio Recording Config (expo-av)\n\n```typescript\nimport { Audio } from 'expo-av';\n\nconst RECORDING_OPTIONS = {\n  android: {\n    extension: '.m4a',\n    outputFormat: Audio.AndroidOutputFormat.MPEG_4,\n    audioEncoder: Audio.AndroidAudioEncoder.AAC,\n    sampleRate: 16000,\n    numberOfChannels: 1,\n    bitRate: 64000,\n  },\n  ios: {\n    extension: '.m4a',\n    outputFormat: Audio.IOSOutputFormat.MPEG4AAC,\n    audioQuality: Audio.IOSAudioQuality.MEDIUM,\n    sampleRate: 16000,\n    numberOfChannels: 1,\n    bitRate: 64000,\n  },\n  web: { mimeType: 'audio/webm', bitsPerSecond: 64000 },\n};\n\n// Max recording: 3 minutes\nconst MAX_RECORDING_MS = 180_000;\n```\n\n## 7. Cost Estimate\n\nPer voice capture:\n- Gemini 2.5 Flash transcription: ~$0.001 (30s audio)\n- Claude Sonnet extraction: ~$0.003 (500 input + 300 output tokens)\n- **Total: ~$0.004 per capture**\n- At 100 captures/day: ~$0.40/day, ~$12/month\n\nCheaper than manual data entry time saved (5 min → 30 sec per voter)."
}

Audit Trail (4)

2d agotask_completedAG Lu
2d agotask_claimedAG Lu
2d agotask_claimedDesktop Lu (status check only)
22d agotask_createdAG Lu
Task ID: 8efa386f-4713-476a-be84-38d924a7ace9