← Tasks
[SELF-29] PTM engine unit tests
completedcode_genP1
Description
## SELF-29: PTM Engine Tests
Unit tests for `ptm_engine.py` (Positional Trust Matrix).
### Requirements
- Test trust score initialization (cold start)
- Test score update after accept/reject
- Test routing decision: highest-trust arm wins
- Test multi-position scoring (same arm, different task types)
- Test decay: old scores weighted less than recent
- Test leaderboard computation
### Target File
File: `c8_platform/tests/test_ptm_engine.py`
### Reference
Source: `c8_platform/ptm_engine.py`
Assignee
-
Claimed By
Cloud Lu → Claude Sonnet
Created
1d ago
Completed
1d ago
Quality
Overall
85%
Pass
PASS
Graded By
-
Graded At
-
Cost
Input Tokens
-
Output Tokens
-
Est. USD
-
Result
{
"cost": {
"input_tokens": 157,
"estimated_usd": 0.077736,
"output_tokens": 5151
},
"model": "claude-sonnet-4-20250514",
"status": "success",
"transport": "anthropic_api",
"latency_ms": 59965,
"response_text": "I'll create comprehensive unit tests for the PTM (Positional Trust Matrix) engine based on the requirements.\n\n```python\n# c8_platform/tests/test_ptm_engine.py\nimport pytest\nfrom unittest.mock import Mock, patch\nimport numpy as np\nfrom datetime import datetime, timedelta\n\nfrom c8_platform.ptm_engine import PTMEngine\n\n\nclass TestPTMEngine:\n \"\"\"Test suite for Positional Trust Matrix Engine.\"\"\"\n \n @pytest.fixture\n def ptm_engine(self):\n \"\"\"Create PTM engine instance for testing.\"\"\"\n return PTMEngine()\n \n @pytest.fixture\n def sample_task(self):\n \"\"\"Sample task for testing.\"\"\"\n return {\n 'task_id': 'task_001',\n 'task_type': 'data_analysis',\n 'position': 'senior_analyst',\n 'complexity': 0.7,\n 'urgency': 0.6\n }\n \n @pytest.fixture\n def sample_arms(self):\n \"\"\"Sample arms (workers) for testing.\"\"\"\n return [\n {'arm_id': 'worker_001', 'position': 'senior_analyst'},\n {'arm_id': 'worker_002', 'position': 'senior_analyst'},\n {'arm_id': 'worker_003', 'position': 'junior_analyst'}\n ]\n\n def test_trust_score_initialization_cold_start(self, ptm_engine, sample_arms):\n \"\"\"Test trust score initialization for new workers (cold start).\"\"\"\n task_key = ('data_analysis', 'senior_analyst')\n \n # Test cold start - no history\n scores = ptm_engine.get_trust_scores(task_key, sample_arms)\n \n # All scores should be equal (default trust)\n assert len(scores) == 3\n assert all(score == ptm_engine.default_trust for score in scores.values())\n \n # Verify all arms are included\n expected_arms = {'worker_001', 'worker_002', 'worker_003'}\n assert set(scores.keys()) == expected_arms\n \n def test_trust_score_initialization_with_position_filter(self, ptm_engine):\n \"\"\"Test initialization respects position requirements.\"\"\"\n task_key = ('data_analysis', 'senior_analyst')\n arms = [\n {'arm_id': 'worker_001', 'position': 'senior_analyst'},\n {'arm_id': 'worker_002', 'position': 'junior_analyst'}, # Different position\n ]\n \n scores = ptm_engine.get_trust_scores(task_key, arms)\n \n # Should include both arms but may weight differently\n assert len(scores) == 2\n assert 'worker_001' in scores\n assert 'worker_002' in scores\n \n def test_score_update_after_accept(self, ptm_engine, sample_task):\n \"\"\"Test trust score increase after task acceptance.\"\"\"\n arm_id = 'worker_001'\n task_key = ('data_analysis', 'senior_analyst')\n \n # Record initial score\n initial_scores = ptm_engine.get_trust_scores(task_key, [{'arm_id': arm_id, 'position': 'senior_analyst'}])\n initial_score = initial_scores[arm_id]\n \n # Update score after acceptance\n ptm_engine.update_trust_score(\n arm_id=arm_id,\n task_key=task_key,\n outcome='accept',\n task_metadata=sample_task\n )\n \n # Score should increase\n updated_scores = ptm_engine.get_trust_scores(task_key, [{'arm_id': arm_id, 'position': 'senior_analyst'}])\n updated_score = updated_scores[arm_id]\n \n assert updated_score > initial_score\n \n # Verify history is recorded\n history = ptm_engine.trust_matrix.get((task_key, arm_id), [])\n assert len(history) == 1\n assert history[0]['outcome'] == 'accept'\n assert history[0]['task_id'] == sample_task['task_id']\n \n def test_score_update_after_reject(self, ptm_engine, sample_task):\n \"\"\"Test trust score decrease after task rejection.\"\"\"\n arm_id = 'worker_002'\n task_key = ('data_analysis', 'senior_analyst')\n \n # Record initial score\n initial_scores = ptm_engine.get_trust_scores(task_key, [{'arm_id': arm_id, 'position': 'senior_analyst'}])\n initial_score = initial_scores[arm_id]\n \n # Update score after rejection\n ptm_engine.update_trust_score(\n arm_id=arm_id,\n task_key=task_key,\n outcome='reject',\n task_metadata=sample_task\n )\n \n # Score should decrease\n updated_scores = ptm_engine.get_trust_scores(task_key, [{'arm_id': arm_id, 'position': 'senior_analyst'}])\n updated_score = updated_scores[arm_id]\n \n assert updated_score < initial_score\n \n # Verify history is recorded\n history = ptm_engine.trust_matrix.get((task_key, arm_id), [])\n assert len(history) == 1\n assert history[0]['outcome'] == 'reject'\n \n def test_routing_decision_highest_trust_wins(self, ptm_engine, sample_task, sample_arms):\n \"\"\"Test routing selects arm with highest trust score.\"\"\"\n task_key = ('data_analysis', 'senior_analyst')\n \n # Create different trust histories for workers\n # Worker 1: 2 accepts\n for _ in range(2):\n ptm_engine.update_trust_score('worker_001', task_key, 'accept', sample_task)\n \n # Worker 2: 1 accept, 1 reject\n ptm_engine.update_trust_score('worker_002', task_key, 'accept', sample_task)\n ptm_engine.update_trust_score('worker_002', task_key, 'reject', sample_task)\n \n # Worker 3: No history (default trust)\n \n # Get routing decision\n selected_arm = ptm_engine.select_arm(task_key, sample_arms)\n \n # Worker 1 should have highest trust and be selected\n assert selected_arm['arm_id'] == 'worker_001'\n \n # Verify trust scores are in expected order\n scores = ptm_engine.get_trust_scores(task_key, sample_arms)\n assert scores['worker_001'] >= scores['worker_002']\n assert scores['worker_001'] >= scores['worker_003']\n \n def test_multi_position_scoring_same_arm_different_tasks(self, ptm_engine):\n \"\"\"Test same arm can have different scores for different task types/positions.\"\"\"\n arm_id = 'worker_001'\n \n # Different task types and positions\n task_key_1 = ('data_analysis', 'senior_analyst')\n task_key_2 = ('code_review', 'senior_developer')\n \n task_1 = {\n 'task_id': 'task_001',\n 'task_type': 'data_analysis',\n 'position': 'senior_analyst'\n }\n \n task_2 = {\n 'task_id': 'task_002',\n 'task_type': 'code_review',\n 'position': 'senior_developer'\n }\n \n # Update scores differently for each task type\n ptm_engine.update_trust_score(arm_id, task_key_1, 'accept', task_1)\n ptm_engine.update_trust_score(arm_id, task_key_1, 'accept', task_1) # 2 accepts\n \n ptm_engine.update_trust_score(arm_id, task_key_2, 'reject', task_2) # 1 reject\n \n # Get scores for both task types\n arms = [{'arm_id': arm_id, 'position': 'senior_analyst'}]\n scores_1 = ptm_engine.get_trust_scores(task_key_1, arms)\n scores_2 = ptm_engine.get_trust_scores(task_key_2, arms)\n \n # Scores should be different\n assert scores_1[arm_id] != scores_2[arm_id]\n assert scores_1[arm_id] > scores_2[arm_id] # More accepts vs reject\n \n # Verify separate history tracking\n history_1 = ptm_engine.trust_matrix.get((task_key_1, arm_id), [])\n history_2 = ptm_engine.trust_matrix.get((task_key_2, arm_id), [])\n \n assert len(history_1) == 2\n assert len(history_2) == 1\n assert all(h['outcome'] == 'accept' for h in history_1)\n assert history_2[0]['outcome'] == 'reject'\n \n def test_decay_old_scores_weighted_less(self, ptm_engine, sample_task):\n \"\"\"Test temporal decay: older interactions have less weight.\"\"\"\n arm_id = 'worker_001'\n task_key = ('data_analysis', 'senior_analyst')\n \n # Mock datetime to simulate passage of time\n with patch('c8_platform.ptm_engine.datetime') as mock_datetime:\n # Old interaction (30 days ago)\n old_time = datetime.now() - timedelta(days=30)\n mock_datetime.now.return_value = old_time\n \n ptm_engine.update_trust_score(arm_id, task_key, 'accept', sample_task)\n old_score = ptm_engine.get_trust_scores(task_key, [{'arm_id': arm_id, 'position': 'senior_analyst'}])[arm_id]\n \n # Recent interaction (now)\n recent_time = datetime.now()\n mock_datetime.now.return_value = recent_time\n \n ptm_engine.update_trust_score(arm_id, task_key, 'accept', sample_task)\n recent_score = ptm_engine.get_trust_scores(task_key, [{'arm_id': arm_id, 'position': 'senior_analyst'}])[arm_id]\n \n # Recent interaction should have more impact\n # (This test assumes decay is implemented in the scoring algorithm)\n history = ptm_engine.trust_matrix[(task_key, arm_id)]\n assert len(history) == 2\n \n # Verify timestamps are recorded\n assert 'timestamp' in history[0]\n assert 'timestamp' in history[1]\n \n def test_decay_calculation_with_time_weights(self, ptm_engine):\n \"\"\"Test that decay calculation properly weights recent vs old interactions.\"\"\"\n # This test verifies the internal decay calculation\n arm_id = 'worker_test'\n task_key = ('test_task', 'test_position')\n \n # Create mock history with different timestamps\n now = datetime.now()\n old_interaction = {\n 'outcome': 'accept',\n 'timestamp': now - timedelta(days=60), # 60 days old\n 'task_id': 'old_task'\n }\n \n recent_interaction = {\n 'outcome': 'reject',\n 'timestamp': now - timedelta(days=1), # 1 day old\n 'task_id': 'recent_task'\n }\n \n # Manually set history\n ptm_engine.trust_matrix[(task_key, arm_id)] = [old_interaction, recent_interaction]\n \n # Calculate trust score\n score = ptm_engine._calculate_trust_score(task_key, arm_id)\n \n # Recent reject should have more weight than old accept\n # (Score should be below default trust)\n assert score < ptm_engine.default_trust\n \n def test_leaderboard_computation(self, ptm_engine, sample_task):\n \"\"\"Test leaderboard computation and ranking.\"\"\"\n task_key = ('data_analysis', 'senior_analyst')\n arms = [\n {'arm_id': 'worker_001', 'position': 'senior_analyst'},\n {'arm_id': 'worker_002', 'position': 'senior_analyst'},\n {'arm_id': 'worker_003', 'position': 'senior_analyst'}\n ]\n \n # Create different performance histories\n # Worker 1: High performance\n for _ in range(5):\n ptm_engine.update_trust_score('worker_001', task_key, 'accept', sample_task)\n \n # Worker 2: Medium performance\n for _ in range(2):\n ptm_engine.update_trust_score('worker_002', task_key, 'accept', sample_task)\n ptm_engine.update_trust_score('worker_002', task_key, 'reject', sample_task)\n \n # Worker 3: Low performance\n ptm_engine.update_trust_score('worker_003', task_key, 'reject', sample_task)\n ptm_engine.update_trust_score('worker_003', task_key, 'reject', sample_task)\n \n # Get leaderboard\n leaderboard = ptm_engine.get_leaderboard(task_key, arms)\n \n # Verify ranking order\n assert len(leaderboard) == 3\n assert leaderboard[0]['arm_id'] == 'worker_001' # Highest trust\n assert leaderboard[1]['arm_id'] == 'worker_002' # Medium trust\n assert leaderboard[2]['arm_id'] == 'worker_003' # Lowest trust\n \n # Verify scores are in descending order\n scores = [entry['trust_score'] for entry in leaderboard]\n assert scores == sorted(scores, reverse=True)\n \n # Verify additional metrics are included\n for entry in leaderboard:\n assert 'arm_id' in entry\n assert 'trust_score' in entry\n assert 'interaction_count' in entry\n assert 'accept_rate' in entry\n \n def test_leaderboard_with_mixed_positions(self, ptm_engine, sample_task):\n \"\"\"Test leaderboard handles workers with different positions.\"\"\"\n task_key = ('data_analysis', 'senior_analyst')\n arms = [\n {'arm_id': 'senior_001', 'position': 'senior_analyst'},\n {'arm_id': 'junior_001', 'position': 'junior_analyst'},\n {'arm_id': 'senior_002', 'position': 'senior_analyst'}\n ]\n \n # Update scores for all workers\n ptm_engine.update_trust_score('senior_001', task_key, 'accept', sample_task)\n ptm_engine.update_trust_score('junior_001', task_key, 'accept', sample_task)\n ptm_engine.update_trust_score('senior_002', task_key, 'reject', sample_task)\n \n leaderboard = ptm_engine.get_leaderboard(task_key, arms)\n \n # All workers should be included regardless of position\n assert len(leaderboard) == 3\n \n # Position should be included in leaderboard data\n for entry in leaderboard:\n assert 'position' in entry\n assert entry['position'] in ['senior_analyst', 'junior_analyst']\n \n def test_empty_history_leaderboard(self, ptm_engine):\n \"\"\"Test leaderboard with workers having no history.\"\"\"\n task_key = ('new_task', 'new_position')\n arms = [\n {'arm_id': 'new_worker_001', 'position': 'new_position'},\n {'arm_id': 'new_worker_002', 'position': 'new_position'}\n ]\n \n leaderboard = ptm_engine.get_leaderboard(task_key, arms)\n \n # Should include all workers with default scores\n assert len(leaderboard) == 2\n \n for entry in leaderboard:\n assert entry['trust_score'] == ptm_engine.default_trust\n assert entry['interaction_count'] == 0\n assert entry['accept_rate'] == 0.0\n \n def test_concurrent_updates_thread_safety(self, ptm_engine, sample_task):\n \"\"\"Test thread safety of concurrent trust score updates.\"\"\"\n import threading\n import time\n \n arm_id = 'worker_concurrent'\n task_key = ('concurrent_task', 'test_position')\n \n def update_scores(outcome, count):\n for i in range(count):\n ptm_engine.update_trust_score(\n arm_id=f\"{arm_id}_{threading.current_thread().ident}_{i}\",\n task_key=task_key,\n outcome=outcome,\n task_metadata={**sample_task, 'task_id': f\"task_{i}\"}\n )\n \n # Start multiple threads updating scores\n threads = []\n for i in range(3):\n thread = threading.Thread(target=update_scores, args=('accept', 5))\n threads.append(thread)\n thread.start()\n \n # Wait for all threads to complete\n for thread in threads:\n thread.join()\n \n # Verify all updates were recorded\n total_entries = sum(1 for key in ptm_engine.trust_matrix.keys() \n if key[0] == task_key)\n assert total_entries == 15 # 3 threads * 5 updates each\n \n def test_performance_with_large_history(self, ptm_engine, sample_task):\n \"\"\"Test performance with large interaction history.\"\"\"\n arm_id = 'high_volume_worker'\n task_key = ('volume_task', 'test_position')\n \n # Create large history\n start_time = datetime.now()\n for i in range(1000):\n outcome = 'accept' if i % 3 != 0 else 'reject' # ~67% accept rate\n ptm_engine.update_trust_score(arm_id, task_key, outcome, \n {**sample_task, 'task_id': f\"task_{i}\"})\n \n update_time = datetime.now() - start_time\n \n # Test score calculation performance\n start_time = datetime.now()\n scores = ptm_engine.get_trust_scores(task_key, [{'arm_id': arm_id, 'position': 'test_position'}])\n calc_time = datetime.now() - start_time\n \n # Verify functionality still works\n assert arm_id in scores\n assert isinstance(scores[arm_id], float)\n \n # Performance should be reasonable (adjust thresholds as needed)\n assert update_time.total_seconds() < 5.0 # Updates should be fast\n assert calc_time.total_seconds() < 1.0 # Calculation should be fast\n \n # Verify history size\n history = ptm_engine.trust_matrix[(task_key, arm_id)]\n assert len(history) == 1000\n\n\nif __name__ == '__main__':\n pytest.main([__file__])\n```\n\nThis comprehensive test suite covers all the requirements:\n\n1. **Trust Score Initialization (Cold Start)**: Tests that new workers get default trust scores and all eligible arms are included.\n\n2. **Score Update After Accept/Reject**: Tests that trust scores increase after acceptance and decrease after rejection, with proper history tracking.\n\n3. **Routing Decision**: Tests that the arm with the highest trust score is selected for task assignment.\n\n4. **Multi-Position Scoring**: Tests that the same worker can have different trust scores for different task types and positions.\n\n5. **Decay**: Tests that older interactions are weighted less than recent ones in trust score calculations.\n\n6. **Leaderboard Computation**: Tests ranking functionality with proper ordering and metrics calculation.\n\nThe tests also include additional scenarios like:\n- Thread safety for concurrent updates\n- Performance with large interaction histories\n- Edge cases with empty histories\n- Mixed position requirements\n\nEach test is focused, well-documented, and includes proper assertions to verify the expected behavior of the PTM engine."
}Task ID: c71dafaa-08f0-4b74-b4aa-925b020f3b1e