← Tasks
CampaignPro Voice Capture Integration Spec
completedconsultP1
Description
Draft a technical spec for integrating C8's audio_ingest.py pipeline into CampaignPro for voice-powered voter data capture.
EXISTING C8 INFRA:
- c8_platform/audio_ingest.py: Full pipeline: voice → transcript (Gemini/Whisper) → structured extraction (Claude) → voter fields
- Has Spanish-native processing with Paulina TTS
- Has --interview mode for interactive voice intake
- STRUCTURE_SYSTEM_PROMPT already produces 8-section analysis in Spanish
- Language routing supports Spanish natively
CAMPAIGNPRO CONTEXT:
- Expo SDK 52 + Supabase
- VoterCaptureScreen already has full form (name, CURP, phone, address, section, sentiment, notes)
- Voter interface: { full_name, curp, phone, address, section, sentiment, notes, lat, lng, photo_url }
- Target: field promoters in rural Mexico who find typing slow
DELIVERABLES:
1. Architecture: VoiceCaptureScreen → expo-av recording → Supabase Edge Function → audio_ingest extract_voter() → return structured fields → pre-fill VoterCaptureScreen
2. Edge Function spec: accept audio blob, call Gemini for transcription, extract voter fields using Claude prompt
3. Extraction prompt: Spanish-language prompt that extracts {full_name, section, sentiment, phone, address, notes} from conversational audio
4. UI flow: mic button → recording indicator → processing spinner → pre-filled form fields with confidence scores → user confirms/edits → save
5. Fallback: if extraction fails, show raw transcript and let user fill form manually
Output as structured markdown.
Assignee
-
Claimed By
AG Lu
Created
22d ago
Completed
2d ago
Quality
Overall
98%
Pass
PASS
Graded By
-
Graded At
-
Cost
Input Tokens
-
Output Tokens
-
Est. USD
-
Result
{
"code": "# CampaignPro Voice Capture Integration — Technical Spec\n\n## 1. Architecture\n\n```\n┌──────────────────┐ ┌────────────────────┐ ┌────────────────────────┐\n│ CampaignPro │ │ Supabase Edge Fn │ │ Gemini 2.5 Flash │\n│ VoiceCaptureScr │───▶│ /voice-extract │───▶│ (transcription) │\n│ expo-av record │ │ │ └────────────┬───────────┘\n│ m4a/wav blob │ │ │ │\n└──────────────────┘ │ │ ┌────────────▼───────────┐\n ▲ │ │───▶│ Claude Sonnet 4 │\n │ │ │ │ (field extraction) │\n │ └────────┬───────────┘ └────────────┬───────────┘\n │ │ │\n │ ▼ │\n │ ┌────────────────────┐ │\n │ │ Structured JSON │◀────────────────┘\n │ │ {full_name, curp, │\n │ │ phone, section, │\n │ │ sentiment, notes} │\n └───────────────┤ + confidence scores│\n pre-fill form └────────────────────┘\n```\n\n**Flow**:\n1. Promoter taps mic → expo-av records audio (m4a, 16kHz mono, max 3 min)\n2. Upload to Edge Function as multipart/form-data\n3. Edge Function sends audio to Gemini 2.5 Flash for transcription (Spanish native)\n4. Transcript → Claude Sonnet 4 with extraction prompt → structured JSON\n5. Return extracted fields + confidence scores to app\n6. App pre-fills VoterCaptureScreen fields, user confirms/edits, saves\n\n## 2. Edge Function\n\n```typescript\n// supabase/functions/voice-extract/index.ts\nimport { serve } from \"https://deno.land/std@0.177.0/http/server.ts\";\nimport { GoogleGenerativeAI } from \"https://esm.sh/@google/generative-ai@0.24.0\";\nimport Anthropic from \"https://esm.sh/@anthropic-ai/sdk@0.39.0\";\n\nconst genAI = new GoogleGenerativeAI(Deno.env.get(\"GEMINI_API_KEY\")!);\nconst anthropic = new Anthropic({ apiKey: Deno.env.get(\"ANTHROPIC_API_KEY\")! });\n\ninterface ExtractedVoter {\n full_name: { value: string | null; confidence: number };\n phone: { value: string | null; confidence: number };\n address: { value: string | null; confidence: number };\n section: { value: string | null; confidence: number };\n municipality: { value: string | null; confidence: number };\n state: { value: string | null; confidence: number };\n sentiment: { value: string | null; confidence: number };\n notes: { value: string | null; confidence: number };\n curp: { value: string | null; confidence: number };\n transcript: string;\n}\n\nserve(async (req: Request) => {\n if (req.method !== \"POST\") {\n return new Response(JSON.stringify({ error: \"Method not allowed\" }), { status: 405 });\n }\n\n try {\n const formData = await req.formData();\n const audioFile = formData.get(\"audio\") as File;\n const language = (formData.get(\"language\") as string) || \"es\";\n\n if (!audioFile) {\n return new Response(JSON.stringify({ error: \"No audio file\" }), { status: 400 });\n }\n\n // Step 1: Transcribe with Gemini (Spanish-native, handles accents/dialects)\n const arrayBuffer = await audioFile.arrayBuffer();\n const base64Audio = btoa(String.fromCharCode(...new Uint8Array(arrayBuffer)));\n const mimeType = audioFile.type || \"audio/m4a\";\n\n const model = genAI.getGenerativeModel({ model: \"gemini-2.5-flash\" });\n const transcriptResult = await model.generateContent([\n \"Transcribe this audio recording exactly as spoken. This is a Mexican Spanish conversation about a voter/citizen. Preserve names, numbers, and addresses exactly. Output ONLY the transcript text, no formatting.\",\n { inlineData: { data: base64Audio, mimeType } },\n ]);\n\n const transcript = transcriptResult.response.text().trim();\n\n if (!transcript || transcript.length < 10) {\n return new Response(JSON.stringify({\n error: \"Could not transcribe audio\",\n transcript: transcript || \"\",\n fallback: true,\n }), { status: 200 });\n }\n\n // Step 2: Extract structured fields with Claude\n const extraction = await anthropic.messages.create({\n model: \"claude-sonnet-4-20250514\",\n max_tokens: 1024,\n messages: [{\n role: \"user\",\n content: EXTRACTION_PROMPT.replace(\"{{TRANSCRIPT}}\", transcript),\n }],\n });\n\n const extractedText = extraction.content[0].type === \"text\"\n ? extraction.content[0].text : \"\";\n const jsonStr = extractedText\n .replace(/```json\\n?/g, \"\").replace(/```\\n?/g, \"\").trim();\n const extracted: ExtractedVoter = { ...JSON.parse(jsonStr), transcript };\n\n return new Response(JSON.stringify(extracted), {\n status: 200,\n headers: { \"Content-Type\": \"application/json\" },\n });\n } catch (error) {\n return new Response(\n JSON.stringify({ error: `Voice extraction failed: ${error.message}`, fallback: true }),\n { status: 200, headers: { \"Content-Type\": \"application/json\" } }\n );\n }\n});\n```\n\n## 3. Extraction Prompt (Spanish)\n\n```python\nEXTRACTION_PROMPT = \"\"\"Eres un asistente de captura de datos electorales en México.\nA partir de la siguiente transcripción de una conversación con un ciudadano, extrae los datos del votante.\n\nTRANSCRIPCIÓN:\n{{TRANSCRIPT}}\n\nExtrae los siguientes campos. Para cada campo, indica el valor encontrado y tu nivel de confianza (0.0-1.0).\nSi un dato NO se menciona en la conversación, pon value=null y confidence=0.0.\n\nReglas:\n- full_name: Nombre completo tal como lo dice la persona. Si solo dan nombre de pila, confianza baja.\n- phone: Número telefónico a 10 dígitos. Formato: sin prefijo +52.\n- address: Dirección lo más completa posible (calle, número, colonia, CP).\n- section: Sección electoral (4 dígitos). A veces dicen \"mi sección es la 0142\".\n- municipality: Municipio o alcaldía.\n- state: Estado (e.g., Jalisco, CDMX, Nuevo León).\n- sentiment: Clasificar en: supporter (a favor), leaning (inclinado), undecided (indeciso), opposed (en contra), unknown.\n Pistas: \"yo los apoyo\" → supporter, \"no sé\" → undecided, \"no estoy de acuerdo\" → opposed.\n- notes: Cualquier información adicional relevante (preocupaciones, propuestas que mencionan, contexto).\n- curp: CURP si lo mencionan (18 caracteres alfanuméricos). Raro en conversación oral.\n\nResponde SOLO con JSON válido, sin markdown:\n{\n \"full_name\": {\"value\": \"...\", \"confidence\": 0.95},\n \"phone\": {\"value\": \"...\", \"confidence\": 0.90},\n \"address\": {\"value\": \"...\", \"confidence\": 0.80},\n \"section\": {\"value\": \"...\", \"confidence\": 0.85},\n \"municipality\": {\"value\": \"...\", \"confidence\": 0.80},\n \"state\": {\"value\": \"...\", \"confidence\": 0.80},\n \"sentiment\": {\"value\": \"supporter\", \"confidence\": 0.75},\n \"notes\": {\"value\": \"Preocupado por seguridad en su colonia\", \"confidence\": 0.90},\n \"curp\": {\"value\": null, \"confidence\": 0.0}\n}\"\"\"\n```\n\n## 4. UI Flow — VoiceCaptureScreen\n\n```\n┌─────────────────────────────────────┐\n│ ← Captura por Voz │\n├─────────────────────────────────────┤\n│ │\n│ Habla con el votante y nosotros │\n│ extraemos los datos automáticamente│\n│ │\n│ ┌───────────────┐ │\n│ │ │ │\n│ │ 🎙️ (large) │ │\n│ │ │ │\n│ └───────────────┘ │\n│ Toca para grabar │\n│ │\n│ ─── Recording state ─── │\n│ │\n│ ┌───────────────┐ │\n│ │ ⏹️ 0:42 │ │\n│ │ ████████░░░░ │ │\n│ └───────────────┘ │\n│ ┌──────────┐ ┌──────────────┐ │\n│ │ Cancelar │ │ ✅ Procesar │ │\n│ └──────────┘ └──────────────┘ │\n│ │\n│ ─── Processing state ─── │\n│ │\n│ ⏳ Transcribiendo audio... │\n│ ████████████░░░░ 60% │\n│ ⏳ Extrayendo datos del votante...│\n│ │\n│ ─── Results state ─── │\n│ │\n│ 📝 Transcripción: │\n│ ┌─────────────────────────────┐ │\n│ │ \"Sí, me llamo Roberto │ │\n│ │ García, vivo en la colonia │ │\n│ │ Providencia, sección 0401\" │ │\n│ └─────────────────────────────┘ │\n│ │\n│ Datos extraídos: │\n│ ✅ Nombre: Roberto García 95% │\n│ ✅ Sección: 0401 97% │\n│ ✅ Dirección: Col. Provid... 80% │\n│ ⚠️ Teléfono: — 0% │\n│ ✅ Sentimiento: A favor 75% │\n│ │\n│ ┌─────────────────────────────┐ │\n│ │ ✅ Usar datos y completar │ │\n│ └─────────────────────────────┘ │\n│ ┌─────────────────────────────┐ │\n│ │ 🔄 Grabar de nuevo │ │\n│ └─────────────────────────────┘ │\n└─────────────────────────────────────┘\n```\n\n**States**: idle → recording → processing → results (or fallback)\n\n**Key interactions**:\n- Mic button: tap to start, tap again to stop (or auto-stop at 3 min)\n- Visual waveform/pulse animation during recording (react-native-reanimated)\n- Progress bar during processing (fake progress: 0-40% transcribing, 40-90% extracting, 90-100% done)\n- Results show per-field confidence: ≥80% green ✅, 50-79% amber ⚠️, <50% red/hidden\n- \"Usar datos\" navigates to VoterCaptureScreen with pre-filled fields\n- Fields with confidence <50% are left empty for manual entry\n\n**Integration with VoterCaptureScreen**:\n```typescript\nnavigation.navigate('VoterCapture', {\n prefill: {\n full_name: extracted.full_name?.value,\n phone: extracted.phone?.value,\n address: extracted.address?.value,\n section: extracted.section?.value,\n municipality: extracted.municipality?.value,\n state: extracted.state?.value,\n sentiment: extracted.sentiment?.value,\n notes: extracted.notes?.value,\n },\n source: 'voice',\n transcript: extracted.transcript,\n});\n```\n\n## 5. Fallback Behavior\n\nIf extraction fails at any stage:\n\n| Failure | Fallback |\n|---------|----------|\n| Audio too short (<2s) | \"Grabación muy corta. Intenta de nuevo.\" |\n| Transcription empty | Show \"No se pudo transcribir\" + retry button |\n| Transcription OK but extraction fails | Show raw transcript + manual form |\n| Network timeout | Cache audio locally, retry when online |\n| Gemini quota exceeded | Fall back to Whisper via OpenAI API |\n\n```typescript\n// Fallback component\nif (result.fallback) {\n return (\n <View>\n <Text>No pudimos extraer los datos automáticamente.</Text>\n <Text>Transcripción:</Text>\n <Text selectable>{result.transcript}</Text>\n <Button title=\"Llenar manualmente\" onPress={() =>\n navigation.navigate('VoterCapture', {\n prefill: { notes: result.transcript },\n source: 'voice_fallback',\n })\n } />\n </View>\n );\n}\n```\n\n## 6. Audio Recording Config (expo-av)\n\n```typescript\nimport { Audio } from 'expo-av';\n\nconst RECORDING_OPTIONS = {\n android: {\n extension: '.m4a',\n outputFormat: Audio.AndroidOutputFormat.MPEG_4,\n audioEncoder: Audio.AndroidAudioEncoder.AAC,\n sampleRate: 16000,\n numberOfChannels: 1,\n bitRate: 64000,\n },\n ios: {\n extension: '.m4a',\n outputFormat: Audio.IOSOutputFormat.MPEG4AAC,\n audioQuality: Audio.IOSAudioQuality.MEDIUM,\n sampleRate: 16000,\n numberOfChannels: 1,\n bitRate: 64000,\n },\n web: { mimeType: 'audio/webm', bitsPerSecond: 64000 },\n};\n\n// Max recording: 3 minutes\nconst MAX_RECORDING_MS = 180_000;\n```\n\n## 7. Cost Estimate\n\nPer voice capture:\n- Gemini 2.5 Flash transcription: ~$0.001 (30s audio)\n- Claude Sonnet extraction: ~$0.003 (500 input + 300 output tokens)\n- **Total: ~$0.004 per capture**\n- At 100 captures/day: ~$0.40/day, ~$12/month\n\nCheaper than manual data entry time saved (5 min → 30 sec per voter).",
"content": "# CampaignPro Voice Capture Integration — Technical Spec\n\n## 1. Architecture\n\n```\n┌──────────────────┐ ┌────────────────────┐ ┌────────────────────────┐\n│ CampaignPro │ │ Supabase Edge Fn │ │ Gemini 2.5 Flash │\n│ VoiceCaptureScr │───▶│ /voice-extract │───▶│ (transcription) │\n│ expo-av record │ │ │ └────────────┬───────────┘\n│ m4a/wav blob │ │ │ │\n└──────────────────┘ │ │ ┌────────────▼───────────┐\n ▲ │ │───▶│ Claude Sonnet 4 │\n │ │ │ │ (field extraction) │\n │ └────────┬───────────┘ └────────────┬───────────┘\n │ │ │\n │ ▼ │\n │ ┌────────────────────┐ │\n │ │ Structured JSON │◀────────────────┘\n │ │ {full_name, curp, │\n │ │ phone, section, │\n │ │ sentiment, notes} │\n └───────────────┤ + confidence scores│\n pre-fill form └────────────────────┘\n```\n\n**Flow**:\n1. Promoter taps mic → expo-av records audio (m4a, 16kHz mono, max 3 min)\n2. Upload to Edge Function as multipart/form-data\n3. Edge Function sends audio to Gemini 2.5 Flash for transcription (Spanish native)\n4. Transcript → Claude Sonnet 4 with extraction prompt → structured JSON\n5. Return extracted fields + confidence scores to app\n6. App pre-fills VoterCaptureScreen fields, user confirms/edits, saves\n\n## 2. Edge Function\n\n```typescript\n// supabase/functions/voice-extract/index.ts\nimport { serve } from \"https://deno.land/std@0.177.0/http/server.ts\";\nimport { GoogleGenerativeAI } from \"https://esm.sh/@google/generative-ai@0.24.0\";\nimport Anthropic from \"https://esm.sh/@anthropic-ai/sdk@0.39.0\";\n\nconst genAI = new GoogleGenerativeAI(Deno.env.get(\"GEMINI_API_KEY\")!);\nconst anthropic = new Anthropic({ apiKey: Deno.env.get(\"ANTHROPIC_API_KEY\")! });\n\ninterface ExtractedVoter {\n full_name: { value: string | null; confidence: number };\n phone: { value: string | null; confidence: number };\n address: { value: string | null; confidence: number };\n section: { value: string | null; confidence: number };\n municipality: { value: string | null; confidence: number };\n state: { value: string | null; confidence: number };\n sentiment: { value: string | null; confidence: number };\n notes: { value: string | null; confidence: number };\n curp: { value: string | null; confidence: number };\n transcript: string;\n}\n\nserve(async (req: Request) => {\n if (req.method !== \"POST\") {\n return new Response(JSON.stringify({ error: \"Method not allowed\" }), { status: 405 });\n }\n\n try {\n const formData = await req.formData();\n const audioFile = formData.get(\"audio\") as File;\n const language = (formData.get(\"language\") as string) || \"es\";\n\n if (!audioFile) {\n return new Response(JSON.stringify({ error: \"No audio file\" }), { status: 400 });\n }\n\n // Step 1: Transcribe with Gemini (Spanish-native, handles accents/dialects)\n const arrayBuffer = await audioFile.arrayBuffer();\n const base64Audio = btoa(String.fromCharCode(...new Uint8Array(arrayBuffer)));\n const mimeType = audioFile.type || \"audio/m4a\";\n\n const model = genAI.getGenerativeModel({ model: \"gemini-2.5-flash\" });\n const transcriptResult = await model.generateContent([\n \"Transcribe this audio recording exactly as spoken. This is a Mexican Spanish conversation about a voter/citizen. Preserve names, numbers, and addresses exactly. Output ONLY the transcript text, no formatting.\",\n { inlineData: { data: base64Audio, mimeType } },\n ]);\n\n const transcript = transcriptResult.response.text().trim();\n\n if (!transcript || transcript.length < 10) {\n return new Response(JSON.stringify({\n error: \"Could not transcribe audio\",\n transcript: transcript || \"\",\n fallback: true,\n }), { status: 200 });\n }\n\n // Step 2: Extract structured fields with Claude\n const extraction = await anthropic.messages.create({\n model: \"claude-sonnet-4-20250514\",\n max_tokens: 1024,\n messages: [{\n role: \"user\",\n content: EXTRACTION_PROMPT.replace(\"{{TRANSCRIPT}}\", transcript),\n }],\n });\n\n const extractedText = extraction.content[0].type === \"text\"\n ? extraction.content[0].text : \"\";\n const jsonStr = extractedText\n .replace(/```json\\n?/g, \"\").replace(/```\\n?/g, \"\").trim();\n const extracted: ExtractedVoter = { ...JSON.parse(jsonStr), transcript };\n\n return new Response(JSON.stringify(extracted), {\n status: 200,\n headers: { \"Content-Type\": \"application/json\" },\n });\n } catch (error) {\n return new Response(\n JSON.stringify({ error: `Voice extraction failed: ${error.message}`, fallback: true }),\n { status: 200, headers: { \"Content-Type\": \"application/json\" } }\n );\n }\n});\n```\n\n## 3. Extraction Prompt (Spanish)\n\n```python\nEXTRACTION_PROMPT = \"\"\"Eres un asistente de captura de datos electorales en México.\nA partir de la siguiente transcripción de una conversación con un ciudadano, extrae los datos del votante.\n\nTRANSCRIPCIÓN:\n{{TRANSCRIPT}}\n\nExtrae los siguientes campos. Para cada campo, indica el valor encontrado y tu nivel de confianza (0.0-1.0).\nSi un dato NO se menciona en la conversación, pon value=null y confidence=0.0.\n\nReglas:\n- full_name: Nombre completo tal como lo dice la persona. Si solo dan nombre de pila, confianza baja.\n- phone: Número telefónico a 10 dígitos. Formato: sin prefijo +52.\n- address: Dirección lo más completa posible (calle, número, colonia, CP).\n- section: Sección electoral (4 dígitos). A veces dicen \"mi sección es la 0142\".\n- municipality: Municipio o alcaldía.\n- state: Estado (e.g., Jalisco, CDMX, Nuevo León).\n- sentiment: Clasificar en: supporter (a favor), leaning (inclinado), undecided (indeciso), opposed (en contra), unknown.\n Pistas: \"yo los apoyo\" → supporter, \"no sé\" → undecided, \"no estoy de acuerdo\" → opposed.\n- notes: Cualquier información adicional relevante (preocupaciones, propuestas que mencionan, contexto).\n- curp: CURP si lo mencionan (18 caracteres alfanuméricos). Raro en conversación oral.\n\nResponde SOLO con JSON válido, sin markdown:\n{\n \"full_name\": {\"value\": \"...\", \"confidence\": 0.95},\n \"phone\": {\"value\": \"...\", \"confidence\": 0.90},\n \"address\": {\"value\": \"...\", \"confidence\": 0.80},\n \"section\": {\"value\": \"...\", \"confidence\": 0.85},\n \"municipality\": {\"value\": \"...\", \"confidence\": 0.80},\n \"state\": {\"value\": \"...\", \"confidence\": 0.80},\n \"sentiment\": {\"value\": \"supporter\", \"confidence\": 0.75},\n \"notes\": {\"value\": \"Preocupado por seguridad en su colonia\", \"confidence\": 0.90},\n \"curp\": {\"value\": null, \"confidence\": 0.0}\n}\"\"\"\n```\n\n## 4. UI Flow — VoiceCaptureScreen\n\n```\n┌─────────────────────────────────────┐\n│ ← Captura por Voz │\n├─────────────────────────────────────┤\n│ │\n│ Habla con el votante y nosotros │\n│ extraemos los datos automáticamente│\n│ │\n│ ┌───────────────┐ │\n│ │ │ │\n│ │ 🎙️ (large) │ │\n│ │ │ │\n│ └───────────────┘ │\n│ Toca para grabar │\n│ │\n│ ─── Recording state ─── │\n│ │\n│ ┌───────────────┐ │\n│ │ ⏹️ 0:42 │ │\n│ │ ████████░░░░ │ │\n│ └───────────────┘ │\n│ ┌──────────┐ ┌──────────────┐ │\n│ │ Cancelar │ │ ✅ Procesar │ │\n│ └──────────┘ └──────────────┘ │\n│ │\n│ ─── Processing state ─── │\n│ │\n│ ⏳ Transcribiendo audio... │\n│ ████████████░░░░ 60% │\n│ ⏳ Extrayendo datos del votante...│\n│ │\n│ ─── Results state ─── │\n│ │\n│ 📝 Transcripción: │\n│ ┌─────────────────────────────┐ │\n│ │ \"Sí, me llamo Roberto │ │\n│ │ García, vivo en la colonia │ │\n│ │ Providencia, sección 0401\" │ │\n│ └─────────────────────────────┘ │\n│ │\n│ Datos extraídos: │\n│ ✅ Nombre: Roberto García 95% │\n│ ✅ Sección: 0401 97% │\n│ ✅ Dirección: Col. Provid... 80% │\n│ ⚠️ Teléfono: — 0% │\n│ ✅ Sentimiento: A favor 75% │\n│ │\n│ ┌─────────────────────────────┐ │\n│ │ ✅ Usar datos y completar │ │\n│ └─────────────────────────────┘ │\n│ ┌─────────────────────────────┐ │\n│ │ 🔄 Grabar de nuevo │ │\n│ └─────────────────────────────┘ │\n└─────────────────────────────────────┘\n```\n\n**States**: idle → recording → processing → results (or fallback)\n\n**Key interactions**:\n- Mic button: tap to start, tap again to stop (or auto-stop at 3 min)\n- Visual waveform/pulse animation during recording (react-native-reanimated)\n- Progress bar during processing (fake progress: 0-40% transcribing, 40-90% extracting, 90-100% done)\n- Results show per-field confidence: ≥80% green ✅, 50-79% amber ⚠️, <50% red/hidden\n- \"Usar datos\" navigates to VoterCaptureScreen with pre-filled fields\n- Fields with confidence <50% are left empty for manual entry\n\n**Integration with VoterCaptureScreen**:\n```typescript\nnavigation.navigate('VoterCapture', {\n prefill: {\n full_name: extracted.full_name?.value,\n phone: extracted.phone?.value,\n address: extracted.address?.value,\n section: extracted.section?.value,\n municipality: extracted.municipality?.value,\n state: extracted.state?.value,\n sentiment: extracted.sentiment?.value,\n notes: extracted.notes?.value,\n },\n source: 'voice',\n transcript: extracted.transcript,\n});\n```\n\n## 5. Fallback Behavior\n\nIf extraction fails at any stage:\n\n| Failure | Fallback |\n|---------|----------|\n| Audio too short (<2s) | \"Grabación muy corta. Intenta de nuevo.\" |\n| Transcription empty | Show \"No se pudo transcribir\" + retry button |\n| Transcription OK but extraction fails | Show raw transcript + manual form |\n| Network timeout | Cache audio locally, retry when online |\n| Gemini quota exceeded | Fall back to Whisper via OpenAI API |\n\n```typescript\n// Fallback component\nif (result.fallback) {\n return (\n <View>\n <Text>No pudimos extraer los datos automáticamente.</Text>\n <Text>Transcripción:</Text>\n <Text selectable>{result.transcript}</Text>\n <Button title=\"Llenar manualmente\" onPress={() =>\n navigation.navigate('VoterCapture', {\n prefill: { notes: result.transcript },\n source: 'voice_fallback',\n })\n } />\n </View>\n );\n}\n```\n\n## 6. Audio Recording Config (expo-av)\n\n```typescript\nimport { Audio } from 'expo-av';\n\nconst RECORDING_OPTIONS = {\n android: {\n extension: '.m4a',\n outputFormat: Audio.AndroidOutputFormat.MPEG_4,\n audioEncoder: Audio.AndroidAudioEncoder.AAC,\n sampleRate: 16000,\n numberOfChannels: 1,\n bitRate: 64000,\n },\n ios: {\n extension: '.m4a',\n outputFormat: Audio.IOSOutputFormat.MPEG4AAC,\n audioQuality: Audio.IOSAudioQuality.MEDIUM,\n sampleRate: 16000,\n numberOfChannels: 1,\n bitRate: 64000,\n },\n web: { mimeType: 'audio/webm', bitsPerSecond: 64000 },\n};\n\n// Max recording: 3 minutes\nconst MAX_RECORDING_MS = 180_000;\n```\n\n## 7. Cost Estimate\n\nPer voice capture:\n- Gemini 2.5 Flash transcription: ~$0.001 (30s audio)\n- Claude Sonnet extraction: ~$0.003 (500 input + 300 output tokens)\n- **Total: ~$0.004 per capture**\n- At 100 captures/day: ~$0.40/day, ~$12/month\n\nCheaper than manual data entry time saved (5 min → 30 sec per voter)."
}Audit Trail (4)
2d agotask_completedAG Lu
2d agotask_claimedAG Lu
2d agotask_claimedDesktop Lu (status check only)
22d agotask_createdAG Lu
Task ID: 8efa386f-4713-476a-be84-38d924a7ace9