ScoreGuard exposes a simple REST API for running evaluations and querying results. All endpoints accept and return JSON.
Execute a cohort evaluation. Provide a cohort_name with inline fixtures and agent_configs to create on the fly, or pass a cohort_id to re-run a saved cohort.
| Field | Type | Description |
|---|---|---|
| cohort_id | number | Re-run a saved cohort by ID (skips fixtures/agents fields) |
| cohort_name | string | Name for a new cohort (required if no cohort_id) |
| fixtures | array | Array of fixture objects (required for new cohorts) |
| agent_configs | array | Array of agent config objects (required for new cohorts) |
| Field | Type | Description |
|---|---|---|
| name | string | Human-readable fixture name |
| input | object | JSON payload sent to the agent endpoint |
| expected | object | Expected values used by metric evaluators |
| metrics | array | Metric evaluator configs (see Metrics section) |
| pass_threshold | number | Weighted score threshold for pass/fail (default: 0.5) |
| Field | Type | Description |
|---|---|---|
| name | string | Human-readable agent name |
| endpoint_url | string | URL to POST fixture inputs to |
| headers | object | HTTP headers (e.g. Authorization) |
| timeout_ms | number | Per-call timeout in ms (default: 30000) |
// Example: run new cohort inline
curl -X POST https://scoreguard.polsia.app/api/run \
-H "Content-Type: application/json" \
-d '{
"cohort_name": "onboarding-v3",
"fixtures": [
{
"name": "greeting_test",
"input": { "message": "Hello, help me get started" },
"expected": { "value": "help" },
"metrics": [
{ "type": "contains", "name": "has_help", "value": "help", "weight": 1.0 },
{ "type": "length_range", "name": "reasonable_length", "min": 20, "max": 1000, "weight": 0.5 }
],
"pass_threshold": 0.7
}
],
"agent_configs": [
{
"name": "agent-v1",
"endpoint_url": "https://your-api.com/respond",
"headers": { "Authorization": "Bearer sk-..." },
"timeout_ms": 15000
}
]
}'
{
"run_id": 42,
"cohort_id": 7,
"status": "completed",
"total_fixtures": 1,
"passed": 1,
"failed": 0,
"error_count": 0,
"score": 0.8333,
"baseline_score": 0.91,
"drift": -0.0767,
"results": [
{
"fixture_name": "greeting_test",
"agent_name": "agent-v1",
"status": "pass",
"score": 0.8333,
"metric_scores": {
"has_help": { "pass": true, "score": 1.0, "detail": "text contains: \"help\"" },
"reasonable_length": { "pass": true, "score": 1.0, "detail": "length 142 (range: 20–1000)" }
},
"duration_ms": 823
}
]
}
| Type | Config Fields | Description |
|---|---|---|
| contains | value | Response text contains the given string (case-insensitive) |
| exact_match | value | Response text exactly matches the given string |
| json_field | field, value | Response JSON has a field equal to value (dot notation) |
| length_range | min, max | Response text length is within [min, max] |
| status_code | value | HTTP response status code equals value |
| keywords | values[], threshold | Ratio of keywords present ≥ threshold (default 1.0) |
| regex | pattern, flags | Response text matches regex pattern |
| latency_under | value | Agent response time ≤ value ms |
List all cohorts with fixture count, agent count, run count, and latest score.
Get a cohort with its fixtures, agent configs, and recent runs.
Delete a cohort and all its data.
List runs, optionally filtered by cohort. Returns up to 100.
Get full run detail including per-fixture, per-agent results grouped by fixture.
Summary stats: cohort count, completed runs, pass rate, total evaluations.