Stela Intelligence Layer — Anomaly Detection Pipeline

completedconsultP1
Description

Design and prototype the anomaly detection pipeline that populates the Intelligence Layer (§7) of Stela files. Components: 1. Value-weight anomaly detector — flag products where declared value/kg deviates significantly from market average for that HTS code 2. Routing anomaly detector — flag shipments that pass through unusual FTZ chains 3. Vendor network graph builder — build graph of vendor→broker→carrier relationships, detect unusual patterns 4. Document authenticity scorer — compare invoice/certificate metadata for consistency 5. Behavioral embedding generator — train c8/trade-behavior-v1 on labeled trade data Input: AskRosetta classification results + corridor-engine traffic + existing HTS database Output: IntelligenceLayer JSON block ready to embed in Stela files This is the proprietary intelligence that makes the format valuable to law enforcement. Handle with appropriate sensitivity — no raw data in task outputs, only architecture and pseudocode.
Assignee
Claimed By
AG Lu
Created
19d ago
Completed
2d ago
Quality

Overall
79%
Pass
PASS
Graded By
Graded At
Cost

Input Tokens
Output Tokens
Est. USD
Result

{
  "code": "## Stela Intelligence Layer — Anomaly Detection Pipeline Architecture\n\n### Overview\nThe Intelligence Layer (§7) is the proprietary core of the Rosetta Stela format. It transforms commodity-level classification data into risk signals that are credential-gated for law enforcement and customs authorities.\n\n---\n\n### Component 1: Value-Weight Anomaly Detector\n\n**Input**: AskRosetta classification (HTS code, declared value, weight) + `hts_ontology` market averages\n\n**Algorithm**:\n```\nfor each shipment_line:\n    declared_vpk = declared_value / weight_kg\n    market_vpk = hts_ontology[hts_code].avg_value_per_kg\n    z_score = (declared_vpk - market_vpk) / market_vpk_stddev\n    \n    if z_score < -2.0:  # Significantly undervalued\n        flag(severity=\"high\", type=\"undervaluation\", z=z_score)\n    if z_score > 3.0:   # Significantly overvalued (transfer pricing)\n        flag(severity=\"medium\", type=\"overvaluation\", z=z_score)\n```\n\n**Data source**: Build market averages from `river_classifications` + `census_trade_data` (already in BQ). The `hts_ontology` table stores per-code statistics.\n\n**Output field**: `intelligence.anomalies.value_weight = {z_score, market_avg, declared, flag_level}`\n\n---\n\n### Component 2: Routing Anomaly Detector\n\n**Input**: ISF data (country_of_origin, country_of_export, ports_of_loading, FTZ indicators)\n\n**Logic**:\n```\nknown_routes = load_corridor_patterns(hts_chapter)\n# e.g., Chapter 61 (apparel): Bangladesh→Singapore→LA normal\n#        Chapter 61: Vietnam→Panama→Miami = unusual\n\nroute_hash = hash(origin, export_country, port_chain)\nif route_hash not in known_routes.top_90_pct:\n    deviation_score = 1 - known_routes.get_percentile(route_hash)\n    if deviation_score > 0.95:\n        flag(type=\"routing_anomaly\", chain=port_chain, score=deviation_score)\n```\n\n**Data source**: `cross_rulings` (historical routing), `corridor_engine` session data, `bts_trade_statistics`.\n\n**Sensitive note**: FTZ chain analysis specifically detects transshipment schemes — architecture only, no specific country patterns in output.\n\n---\n\n### Component 3: Vendor Network Graph\n\n**Architecture**:\n```\nNodes: {vendors, brokers, carriers, consignees}\nEdges: {shipped_via, brokered_by, consigned_to, related_to}\n\ngraph = build_from_sources([\n    (\"corridor_engine\", \"sessions\"),      # vendor → broker → carrier from classification sessions\n    (\"product_digital_twins\", \"vendor\"),   # vendor catalog\n    (\"cross_rulings\", \"parties\"),          # historical party relationships\n])\n\n# Detect patterns:\n#   - Star patterns: one broker, many shell vendors (concentration risk)\n#   - Chain patterns: A→B→C→D→A (circular invoicing)\n#   - New-entity injection: previously unseen vendor using established broker\n```\n\n**Storage**: Graph adjacency stored as JSONB in Stela `intelligence.network` field. Full graph in Neo4j or Supabase JSONB (depending on scale).\n\n---\n\n### Component 4: Document Authenticity Scorer\n\n**Input**: Invoice metadata, certificate of origin, packing list\n\n**Checks**:\n```\nconsistency_checks = [\n    (\"invoice.total\", \"==\", \"sum(line_items)\"),           # arithmetic\n    (\"invoice.date\", \"<=\", \"bill_of_lading.date\"),        # temporal\n    (\"cert_origin.country\", \"==\", \"isf.country_of_origin\"), # geographic\n    (\"invoice.currency\", \"in\", EXPECTED_CURRENCIES[origin]), # plausibility\n    (\"invoice.incoterm\", \"matches\", \"shipping_method\"),    # logical\n]\n\nfor check in consistency_checks:\n    if not evaluate(check):\n        score -= check.weight\n\nauthenticity_score = max(0, 1.0 - penalties)\n```\n\n**Output field**: `intelligence.document_authenticity = {score, checks_passed, checks_failed, flags[]}`\n\n---\n\n### Component 5: Behavioral Embedding Generator\n\n**Model**: `c8/trade-behavior-v1` (fine-tuned from NVIDIA Omni-Embed-Nemotron-3B)\n\n**Training data** (pseudocode — no raw data):\n```\ntraining_samples = [\n    # Each sample is a trade transaction feature vector:\n    {\n        \"hts_chapter\": int,\n        \"value_per_kg\": float,\n        \"origin_region\": categorical,\n        \"route_complexity\": int,      # number of transit ports\n        \"broker_centrality\": float,   # from network graph\n        \"doc_consistency\": float,     # from authenticity scorer\n        \"historical_compliance\": float, # past entries by same vendor\n        \"label\": \"compliant\" | \"flagged\" | \"seized\"\n    }\n]\n\n# Fine-tune embedding model to cluster similar risk profiles\n# Matryoshka truncatable: 1024d full → 256d for lightweight\n```\n\n**Output**: 256d embedding stored in `intelligence.behavior_embedding` — enables similarity search (\"find transactions that look like this seized shipment\").\n\n---\n\n### IntelligenceLayer JSON Schema (Stela §7)\n\n```json\n{\n  \"intelligence\": {\n    \"version\": \"0.1.0\",\n    \"generated_at\": \"2026-02-25T...\",\n    \"generator\": \"c8/anomaly-pipeline-v1\",\n    \"credential_gate\": \"law_enforcement_l2\",\n    \"anomalies\": {\n      \"value_weight\": {\"z_score\": -2.3, \"flag\": \"undervaluation\", \"market_avg_vpk\": 12.50},\n      \"routing\": {\"deviation_score\": 0.97, \"unusual_segments\": [\"transit_via_FTZ_42\"]},\n      \"network\": {\"broker_concentration\": 0.85, \"circular_refs\": 0},\n      \"document\": {\"authenticity_score\": 0.72, \"failed_checks\": [\"temporal_mismatch\"]}\n    },\n    \"risk_score\": 0.78,\n    \"behavior_embedding\": [0.12, -0.34, ...],  // 256d Matryoshka\n    \"selective_disclosure\": {\n      \"public\": [\"risk_score\"],\n      \"customs_l1\": [\"anomalies.value_weight\", \"anomalies.routing\"],\n      \"law_enforcement_l2\": [\"anomalies.network\", \"anomalies.document\", \"behavior_embedding\"]\n    }\n  }\n}\n```\n\n### Selective Disclosure (BBS+ Signatures)\n- **Public**: Overall risk_score only (0-1 float)\n- **Customs L1**: Value-weight and routing anomalies (actionable for targeting)\n- **Law Enforcement L2**: Full network graph, document authenticity, behavioral embedding\n- Each disclosure level has its own BBS+ proof — holder can selectively reveal\n\n### Implementation Path\n1. Value-weight detector: Pure Python, uses existing `hts_ontology` + `census_trade_data`\n2. Routing detector: Extend `corridor_engine` session data with port chain tracking\n3. Network graph: Build from existing `product_digital_twins.vendor` + `cross_rulings`\n4. Document scorer: Rule-based, no ML needed initially\n5. Behavioral embedding: Requires labeled training data — start with synthetic from La Roja benchmark",
  "content": "## Stela Intelligence Layer — Anomaly Detection Pipeline Architecture\n\n### Overview\nThe Intelligence Layer (§7) is the proprietary core of the Rosetta Stela format. It transforms commodity-level classification data into risk signals that are credential-gated for law enforcement and customs authorities.\n\n---\n\n### Component 1: Value-Weight Anomaly Detector\n\n**Input**: AskRosetta classification (HTS code, declared value, weight) + `hts_ontology` market averages\n\n**Algorithm**:\n```\nfor each shipment_line:\n    declared_vpk = declared_value / weight_kg\n    market_vpk = hts_ontology[hts_code].avg_value_per_kg\n    z_score = (declared_vpk - market_vpk) / market_vpk_stddev\n    \n    if z_score < -2.0:  # Significantly undervalued\n        flag(severity=\"high\", type=\"undervaluation\", z=z_score)\n    if z_score > 3.0:   # Significantly overvalued (transfer pricing)\n        flag(severity=\"medium\", type=\"overvaluation\", z=z_score)\n```\n\n**Data source**: Build market averages from `river_classifications` + `census_trade_data` (already in BQ). The `hts_ontology` table stores per-code statistics.\n\n**Output field**: `intelligence.anomalies.value_weight = {z_score, market_avg, declared, flag_level}`\n\n---\n\n### Component 2: Routing Anomaly Detector\n\n**Input**: ISF data (country_of_origin, country_of_export, ports_of_loading, FTZ indicators)\n\n**Logic**:\n```\nknown_routes = load_corridor_patterns(hts_chapter)\n# e.g., Chapter 61 (apparel): Bangladesh→Singapore→LA normal\n#        Chapter 61: Vietnam→Panama→Miami = unusual\n\nroute_hash = hash(origin, export_country, port_chain)\nif route_hash not in known_routes.top_90_pct:\n    deviation_score = 1 - known_routes.get_percentile(route_hash)\n    if deviation_score > 0.95:\n        flag(type=\"routing_anomaly\", chain=port_chain, score=deviation_score)\n```\n\n**Data source**: `cross_rulings` (historical routing), `corridor_engine` session data, `bts_trade_statistics`.\n\n**Sensitive note**: FTZ chain analysis specifically detects transshipment schemes — architecture only, no specific country patterns in output.\n\n---\n\n### Component 3: Vendor Network Graph\n\n**Architecture**:\n```\nNodes: {vendors, brokers, carriers, consignees}\nEdges: {shipped_via, brokered_by, consigned_to, related_to}\n\ngraph = build_from_sources([\n    (\"corridor_engine\", \"sessions\"),      # vendor → broker → carrier from classification sessions\n    (\"product_digital_twins\", \"vendor\"),   # vendor catalog\n    (\"cross_rulings\", \"parties\"),          # historical party relationships\n])\n\n# Detect patterns:\n#   - Star patterns: one broker, many shell vendors (concentration risk)\n#   - Chain patterns: A→B→C→D→A (circular invoicing)\n#   - New-entity injection: previously unseen vendor using established broker\n```\n\n**Storage**: Graph adjacency stored as JSONB in Stela `intelligence.network` field. Full graph in Neo4j or Supabase JSONB (depending on scale).\n\n---\n\n### Component 4: Document Authenticity Scorer\n\n**Input**: Invoice metadata, certificate of origin, packing list\n\n**Checks**:\n```\nconsistency_checks = [\n    (\"invoice.total\", \"==\", \"sum(line_items)\"),           # arithmetic\n    (\"invoice.date\", \"<=\", \"bill_of_lading.date\"),        # temporal\n    (\"cert_origin.country\", \"==\", \"isf.country_of_origin\"), # geographic\n    (\"invoice.currency\", \"in\", EXPECTED_CURRENCIES[origin]), # plausibility\n    (\"invoice.incoterm\", \"matches\", \"shipping_method\"),    # logical\n]\n\nfor check in consistency_checks:\n    if not evaluate(check):\n        score -= check.weight\n\nauthenticity_score = max(0, 1.0 - penalties)\n```\n\n**Output field**: `intelligence.document_authenticity = {score, checks_passed, checks_failed, flags[]}`\n\n---\n\n### Component 5: Behavioral Embedding Generator\n\n**Model**: `c8/trade-behavior-v1` (fine-tuned from NVIDIA Omni-Embed-Nemotron-3B)\n\n**Training data** (pseudocode — no raw data):\n```\ntraining_samples = [\n    # Each sample is a trade transaction feature vector:\n    {\n        \"hts_chapter\": int,\n        \"value_per_kg\": float,\n        \"origin_region\": categorical,\n        \"route_complexity\": int,      # number of transit ports\n        \"broker_centrality\": float,   # from network graph\n        \"doc_consistency\": float,     # from authenticity scorer\n        \"historical_compliance\": float, # past entries by same vendor\n        \"label\": \"compliant\" | \"flagged\" | \"seized\"\n    }\n]\n\n# Fine-tune embedding model to cluster similar risk profiles\n# Matryoshka truncatable: 1024d full → 256d for lightweight\n```\n\n**Output**: 256d embedding stored in `intelligence.behavior_embedding` — enables similarity search (\"find transactions that look like this seized shipment\").\n\n---\n\n### IntelligenceLayer JSON Schema (Stela §7)\n\n```json\n{\n  \"intelligence\": {\n    \"version\": \"0.1.0\",\n    \"generated_at\": \"2026-02-25T...\",\n    \"generator\": \"c8/anomaly-pipeline-v1\",\n    \"credential_gate\": \"law_enforcement_l2\",\n    \"anomalies\": {\n      \"value_weight\": {\"z_score\": -2.3, \"flag\": \"undervaluation\", \"market_avg_vpk\": 12.50},\n      \"routing\": {\"deviation_score\": 0.97, \"unusual_segments\": [\"transit_via_FTZ_42\"]},\n      \"network\": {\"broker_concentration\": 0.85, \"circular_refs\": 0},\n      \"document\": {\"authenticity_score\": 0.72, \"failed_checks\": [\"temporal_mismatch\"]}\n    },\n    \"risk_score\": 0.78,\n    \"behavior_embedding\": [0.12, -0.34, ...],  // 256d Matryoshka\n    \"selective_disclosure\": {\n      \"public\": [\"risk_score\"],\n      \"customs_l1\": [\"anomalies.value_weight\", \"anomalies.routing\"],\n      \"law_enforcement_l2\": [\"anomalies.network\", \"anomalies.document\", \"behavior_embedding\"]\n    }\n  }\n}\n```\n\n### Selective Disclosure (BBS+ Signatures)\n- **Public**: Overall risk_score only (0-1 float)\n- **Customs L1**: Value-weight and routing anomalies (actionable for targeting)\n- **Law Enforcement L2**: Full network graph, document authenticity, behavioral embedding\n- Each disclosure level has its own BBS+ proof — holder can selectively reveal\n\n### Implementation Path\n1. Value-weight detector: Pure Python, uses existing `hts_ontology` + `census_trade_data`\n2. Routing detector: Extend `corridor_engine` session data with port chain tracking\n3. Network graph: Build from existing `product_digital_twins.vendor` + `cross_rulings`\n4. Document scorer: Rule-based, no ML needed initially\n5. Behavioral embedding: Requires labeled training data — start with synthetic from La Roja benchmark"
}
Audit Trail (3)

2d agotask_completedAG Lu (Claude Opus)
2d agotask_claimedAG Lu
19d agotask_createdAG Lu
Task ID: 41fed0ab-973e-4680-8ced-27c966e2f719