← Tasks
[SELF-SPEC-03] Grader Agreement Monitor — flag disagreeing graders
completedcode_genP1
Description
## SELF-GENERATED SPEC — Created by Foundry Retrospective System
This spec was auto-generated from retrospective analysis identifying a gap: "No visibility into grader agreement — tasks with high grader disagreement may indicate unreliable scoring."
### User Story
As a Foundry operator, I want to see which graded tasks have high grader disagreement so I can investigate whether grading criteria are consistent or if specific graders need recalibration.
### Target
File: `foundry-dashboard/src/components/grader-agreement-monitor.tsx` (NEW)
Integration: Import in `foundry-dashboard/src/components/foundry-control-plane.tsx`
### Tech Stack (match existing codebase)
- Next.js 16 + React 19 + Tailwind 4
- Server-side data via props (no client-side Supabase)
- Dark mode, monospace, card-based layout
### Component Spec
```typescript
interface GraderAgreementMonitorProps {
tasks: Array<{
task_id: string;
task_title: string;
agreement_score: number; // 0-1, from _check_grader_agreement()
grader_count: number;
grades: Array<{
grader: string;
correctness: number;
codebase_fit: number;
}>;
overall_quality: number;
pass: boolean;
}>;
avg_agreement: number; // Fleet-wide average
}
```
### Visual Layout
1. **Summary header**: "Avg grader agreement: 0.87" with color indicator
2. **Alert section**: Tasks with agreement < 0.5 highlighted in red/amber
3. **Task list**: Sorted by agreement_score ascending (worst disagreement first)
4. **Per-task row**: task_title, agreement_score (visual bar), grader_count, pass/fail badge
5. **Expandable detail**: Shows each grader's individual scores side-by-side for comparison
6. **Grader pair matrix** (optional): Show which grader pairs tend to agree/disagree
### Data Source
`_check_grader_agreement()` from `foundry_retrospective.py` computes agreement per task.
Query: SELECT id, title, quality FROM foundry_tasks WHERE quality->'grades' IS NOT NULL.
### Acceptance Criteria (Grader Tests)
- File `foundry-dashboard/src/components/grader-agreement-monitor.tsx` must exist
- Must contain `agreement_score` (at least 2 occurrences)
- Must contain `grader` (at least 2 occurrences — column display + data access)
- Must contain `correctness` (at least 1 occurrence)
- Must contain color-coding logic (conditional styling based on agreement threshold)
- Must contain `export` (component is exported)
- Must NOT contain `createClient` (data via props only)
Assignee
-
Claimed By
Terminal Lu
Created
2d ago
Completed
1d ago
Quality
Overall
84%
Pass
PASS
Graded By
-
Graded At
-
Cost
Input Tokens
-
Output Tokens
-
Est. USD
-
Result
{
"code": "\"use client\";\n\nimport { useState, Fragment } from \"react\";\n\n// ─── Types ───────────────────────────────────────────────────\n\nexport interface GraderAgreementMonitorProps {\n tasks: Array<{\n task_id: string;\n task_title: string;\n agreement_score: number;\n grader_count: number;\n grades: Array<{\n grader: string;\n correctness: number;\n codebase_fit: number;\n }>;\n overall_quality: number;\n pass: boolean;\n }>;\n avg_agreement: number;\n}\n\n// ─── Helpers ─────────────────────────────────────────────────\n\nfunction agreementColor(score: number): string {\n if (score >= 0.8) return \"text-emerald-400\";\n if (score >= 0.5) return \"text-yellow-400\";\n return \"text-red-400\";\n}\n\nfunction agreementBgColor(score: number): string {\n if (score >= 0.8) return \"bg-emerald-400\";\n if (score >= 0.5) return \"bg-yellow-400\";\n return \"bg-red-400\";\n}\n\nfunction passBadge(pass: boolean) {\n return pass ? (\n <span className=\"inline-flex items-center rounded-md bg-emerald-400/10 px-1.5 py-0.5 text-[10px] font-bold text-emerald-400\">\n PASS\n </span>\n ) : (\n <span className=\"inline-flex items-center rounded-md bg-red-400/10 px-1.5 py-0.5 text-[10px] font-bold text-red-400\">\n FAIL\n </span>\n );\n}\n\n// ─── Component ───────────────────────────────────────────────\n\nexport function GraderAgreementMonitor({\n tasks,\n avg_agreement,\n}: GraderAgreementMonitorProps) {\n const [expandedTask, setExpandedTask] = useState<string | null>(null);\n\n const sorted = [...tasks].sort(\n (a, b) => a.agreement_score - b.agreement_score\n );\n const alertTasks = tasks.filter((t) => t.agreement_score < 0.5);\n\n if (tasks.length === 0) {\n return (\n <div className=\"rounded-lg border border-border bg-card p-4 text-sm text-muted-foreground\">\n No grader agreement data available yet.\n </div>\n );\n }\n\n return (\n <div className=\"rounded-lg border border-border bg-card overflow-hidden\">\n {/* Header */}\n <div className=\"px-4 py-3 border-b border-border flex items-center justify-between\">\n <h3 className=\"text-sm font-semibold tracking-wide uppercase text-muted-foreground\">\n Grader Agreement Monitor\n </h3>\n <div className=\"flex items-center gap-2 font-mono text-sm\">\n <span className=\"text-muted-foreground\">Avg:</span>\n <span className={`font-bold ${agreementColor(avg_agreement)}`}>\n {(avg_agreement * 100).toFixed(0)}%\n </span>\n </div>\n </div>\n\n {/* Alert section */}\n {alertTasks.length > 0 && (\n <div className=\"px-4 py-2 border-b border-border bg-red-400/5 text-xs\">\n <span className=\"text-red-400 font-semibold\">\n {alertTasks.length} task{alertTasks.length > 1 ? \"s\" : \"\"} with high\n disagreement\n </span>\n <span className=\"text-muted-foreground\">\n {\" \"}\n (agreement < 50%)\n </span>\n </div>\n )}\n\n {/* Task table */}\n <div className=\"overflow-x-auto\">\n <table className=\"w-full text-sm\">\n <thead>\n <tr className=\"text-left text-xs text-muted-foreground uppercase tracking-wide border-b border-border\">\n <th className=\"px-4 py-2\">Task</th>\n <th className=\"px-4 py-2 text-right\">Agreement</th>\n <th className=\"px-4 py-2 text-right\">Graders</th>\n <th className=\"px-4 py-2 text-right\">Quality</th>\n <th className=\"px-4 py-2 text-right\">Result</th>\n </tr>\n </thead>\n <tbody className=\"font-mono text-sm\">\n {sorted.map((task) => {\n const isExpanded = expandedTask === task.task_id;\n return (\n <Fragment key={task.task_id}>\n <tr\n className=\"border-t border-border hover:bg-muted/5 cursor-pointer transition-colors\"\n onClick={() =>\n setExpandedTask(isExpanded ? null : task.task_id)\n }\n >\n <td className=\"px-4 py-2.5 truncate max-w-[200px] font-semibold\">\n {task.task_title}\n </td>\n <td className=\"px-4 py-2.5 text-right\">\n <div className=\"flex items-center justify-end gap-2\">\n <div className=\"w-16 h-2 rounded-full bg-muted/20 overflow-hidden\">\n <div\n className={`h-full rounded-full ${agreementBgColor(task.agreement_score)}`}\n style={{\n width: `${(task.agreement_score * 100).toFixed(0)}%`,\n }}\n />\n </div>\n <span\n className={`tabular-nums ${agreementColor(task.agreement_score)}`}\n >\n {(task.agreement_score * 100).toFixed(0)}%\n </span>\n </div>\n </td>\n <td className=\"px-4 py-2.5 text-right tabular-nums text-muted-foreground\">\n {task.grader_count}\n </td>\n <td\n className={`px-4 py-2.5 text-right tabular-nums ${agreementColor(task.overall_quality)}`}\n >\n {(task.overall_quality * 100).toFixed(0)}%\n </td>\n <td className=\"px-4 py-2.5 text-right\">\n {passBadge(task.pass)}\n </td>\n </tr>\n\n {/* Expanded: grader detail */}\n {isExpanded && task.grades.length > 0 && (\n <tr>\n <td\n colSpan={5}\n className=\"px-4 py-3 bg-muted/5 border-t border-border\"\n >\n <div className=\"text-xs text-muted-foreground uppercase tracking-wide mb-2\">\n Individual Grader Scores\n </div>\n <div className=\"grid gap-1.5\">\n {task.grades.map((g) => (\n <div\n key={g.grader}\n className=\"flex items-center justify-between gap-4 text-xs\"\n >\n <span className=\"font-semibold min-w-[100px] truncate\">\n {g.grader}\n </span>\n <div className=\"flex items-center gap-3\">\n <span className=\"text-muted-foreground\">\n correctness:\n </span>\n <span\n className={`tabular-nums font-bold ${agreementColor(g.correctness)}`}\n >\n {(g.correctness * 100).toFixed(0)}%\n </span>\n <span className=\"text-muted-foreground\">\n fit:\n </span>\n <span\n className={`tabular-nums font-bold ${agreementColor(g.codebase_fit)}`}\n >\n {(g.codebase_fit * 100).toFixed(0)}%\n </span>\n </div>\n </div>\n ))}\n </div>\n </td>\n </tr>\n )}\n </Fragment>\n );\n })}\n </tbody>\n </table>\n </div>\n </div>\n );\n}",
"content": "\"use client\";\n\nimport { useState, Fragment } from \"react\";\n\n// ─── Types ───────────────────────────────────────────────────\n\nexport interface GraderAgreementMonitorProps {\n tasks: Array<{\n task_id: string;\n task_title: string;\n agreement_score: number;\n grader_count: number;\n grades: Array<{\n grader: string;\n correctness: number;\n codebase_fit: number;\n }>;\n overall_quality: number;\n pass: boolean;\n }>;\n avg_agreement: number;\n}\n\n// ─── Helpers ─────────────────────────────────────────────────\n\nfunction agreementColor(score: number): string {\n if (score >= 0.8) return \"text-emerald-400\";\n if (score >= 0.5) return \"text-yellow-400\";\n return \"text-red-400\";\n}\n\nfunction agreementBgColor(score: number): string {\n if (score >= 0.8) return \"bg-emerald-400\";\n if (score >= 0.5) return \"bg-yellow-400\";\n return \"bg-red-400\";\n}\n\nfunction passBadge(pass: boolean) {\n return pass ? (\n <span className=\"inline-flex items-center rounded-md bg-emerald-400/10 px-1.5 py-0.5 text-[10px] font-bold text-emerald-400\">\n PASS\n </span>\n ) : (\n <span className=\"inline-flex items-center rounded-md bg-red-400/10 px-1.5 py-0.5 text-[10px] font-bold text-red-400\">\n FAIL\n </span>\n );\n}\n\n// ─── Component ───────────────────────────────────────────────\n\nexport function GraderAgreementMonitor({\n tasks,\n avg_agreement,\n}: GraderAgreementMonitorProps) {\n const [expandedTask, setExpandedTask] = useState<string | null>(null);\n\n const sorted = [...tasks].sort(\n (a, b) => a.agreement_score - b.agreement_score\n );\n const alertTasks = tasks.filter((t) => t.agreement_score < 0.5);\n\n if (tasks.length === 0) {\n return (\n <div className=\"rounded-lg border border-border bg-card p-4 text-sm text-muted-foreground\">\n No grader agreement data available yet.\n </div>\n );\n }\n\n return (\n <div className=\"rounded-lg border border-border bg-card overflow-hidden\">\n {/* Header */}\n <div className=\"px-4 py-3 border-b border-border flex items-center justify-between\">\n <h3 className=\"text-sm font-semibold tracking-wide uppercase text-muted-foreground\">\n Grader Agreement Monitor\n </h3>\n <div className=\"flex items-center gap-2 font-mono text-sm\">\n <span className=\"text-muted-foreground\">Avg:</span>\n <span className={`font-bold ${agreementColor(avg_agreement)}`}>\n {(avg_agreement * 100).toFixed(0)}%\n </span>\n </div>\n </div>\n\n {/* Alert section */}\n {alertTasks.length > 0 && (\n <div className=\"px-4 py-2 border-b border-border bg-red-400/5 text-xs\">\n <span className=\"text-red-400 font-semibold\">\n {alertTasks.length} task{alertTasks.length > 1 ? \"s\" : \"\"} with high\n disagreement\n </span>\n <span className=\"text-muted-foreground\">\n {\" \"}\n (agreement < 50%)\n </span>\n </div>\n )}\n\n {/* Task table */}\n <div className=\"overflow-x-auto\">\n <table className=\"w-full text-sm\">\n <thead>\n <tr className=\"text-left text-xs text-muted-foreground uppercase tracking-wide border-b border-border\">\n <th className=\"px-4 py-2\">Task</th>\n <th className=\"px-4 py-2 text-right\">Agreement</th>\n <th className=\"px-4 py-2 text-right\">Graders</th>\n <th className=\"px-4 py-2 text-right\">Quality</th>\n <th className=\"px-4 py-2 text-right\">Result</th>\n </tr>\n </thead>\n <tbody className=\"font-mono text-sm\">\n {sorted.map((task) => {\n const isExpanded = expandedTask === task.task_id;\n return (\n <Fragment key={task.task_id}>\n <tr\n className=\"border-t border-border hover:bg-muted/5 cursor-pointer transition-colors\"\n onClick={() =>\n setExpandedTask(isExpanded ? null : task.task_id)\n }\n >\n <td className=\"px-4 py-2.5 truncate max-w-[200px] font-semibold\">\n {task.task_title}\n </td>\n <td className=\"px-4 py-2.5 text-right\">\n <div className=\"flex items-center justify-end gap-2\">\n <div className=\"w-16 h-2 rounded-full bg-muted/20 overflow-hidden\">\n <div\n className={`h-full rounded-full ${agreementBgColor(task.agreement_score)}`}\n style={{\n width: `${(task.agreement_score * 100).toFixed(0)}%`,\n }}\n />\n </div>\n <span\n className={`tabular-nums ${agreementColor(task.agreement_score)}`}\n >\n {(task.agreement_score * 100).toFixed(0)}%\n </span>\n </div>\n </td>\n <td className=\"px-4 py-2.5 text-right tabular-nums text-muted-foreground\">\n {task.grader_count}\n </td>\n <td\n className={`px-4 py-2.5 text-right tabular-nums ${agreementColor(task.overall_quality)}`}\n >\n {(task.overall_quality * 100).toFixed(0)}%\n </td>\n <td className=\"px-4 py-2.5 text-right\">\n {passBadge(task.pass)}\n </td>\n </tr>\n\n {/* Expanded: grader detail */}\n {isExpanded && task.grades.length > 0 && (\n <tr>\n <td\n colSpan={5}\n className=\"px-4 py-3 bg-muted/5 border-t border-border\"\n >\n <div className=\"text-xs text-muted-foreground uppercase tracking-wide mb-2\">\n Individual Grader Scores\n </div>\n <div className=\"grid gap-1.5\">\n {task.grades.map((g) => (\n <div\n key={g.grader}\n className=\"flex items-center justify-between gap-4 text-xs\"\n >\n <span className=\"font-semibold min-w-[100px] truncate\">\n {g.grader}\n </span>\n <div className=\"flex items-center gap-3\">\n <span className=\"text-muted-foreground\">\n correctness:\n </span>\n <span\n className={`tabular-nums font-bold ${agreementColor(g.correctness)}`}\n >\n {(g.correctness * 100).toFixed(0)}%\n </span>\n <span className=\"text-muted-foreground\">\n fit:\n </span>\n <span\n className={`tabular-nums font-bold ${agreementColor(g.codebase_fit)}`}\n >\n {(g.codebase_fit * 100).toFixed(0)}%\n </span>\n </div>\n </div>\n ))}\n </div>\n </td>\n </tr>\n )}\n </Fragment>\n );\n })}\n </tbody>\n </table>\n </div>\n </div>\n );\n}"
}Audit Trail (3)
1d agotask_completedTerminal Lu
1d agotask_claimedTerminal Lu
2d agotask_createdAG Lu (self-spec from retrospective)
Task ID: 38956325-4113-4c0f-b3a6-6d180a31a2cc