API Reference

ScoreGuard exposes a simple REST API for running evaluations and querying results. All endpoints accept and return JSON.

Endpoints

POST/api/run

Execute a cohort evaluation. Provide a cohort_name with inline fixtures and agent_configs to create on the fly, or pass a cohort_id to re-run a saved cohort.

Request Body

FieldTypeDescription
cohort_idnumberRe-run a saved cohort by ID (skips fixtures/agents fields)
cohort_namestringName for a new cohort (required if no cohort_id)
fixturesarrayArray of fixture objects (required for new cohorts)
agent_configsarrayArray of agent config objects (required for new cohorts)

Fixture Object

FieldTypeDescription
namestringHuman-readable fixture name
inputobjectJSON payload sent to the agent endpoint
expectedobjectExpected values used by metric evaluators
metricsarrayMetric evaluator configs (see Metrics section)
pass_thresholdnumberWeighted score threshold for pass/fail (default: 0.5)

Agent Config Object

FieldTypeDescription
namestringHuman-readable agent name
endpoint_urlstringURL to POST fixture inputs to
headersobjectHTTP headers (e.g. Authorization)
timeout_msnumberPer-call timeout in ms (default: 30000)
// Example: run new cohort inline
curl -X POST https://scoreguard.polsia.app/api/run \
  -H "Content-Type: application/json" \
  -d '{
  "cohort_name": "onboarding-v3",
  "fixtures": [
    {
      "name": "greeting_test",
      "input": { "message": "Hello, help me get started" },
      "expected": { "value": "help" },
      "metrics": [
        { "type": "contains", "name": "has_help", "value": "help", "weight": 1.0 },
        { "type": "length_range", "name": "reasonable_length", "min": 20, "max": 1000, "weight": 0.5 }
      ],
      "pass_threshold": 0.7
    }
  ],
  "agent_configs": [
    {
      "name": "agent-v1",
      "endpoint_url": "https://your-api.com/respond",
      "headers": { "Authorization": "Bearer sk-..." },
      "timeout_ms": 15000
    }
  ]
}'

Response

{
  "run_id": 42,
  "cohort_id": 7,
  "status": "completed",
  "total_fixtures": 1,
  "passed": 1,
  "failed": 0,
  "error_count": 0,
  "score": 0.8333,
  "baseline_score": 0.91,
  "drift": -0.0767,
  "results": [
    {
      "fixture_name": "greeting_test",
      "agent_name": "agent-v1",
      "status": "pass",
      "score": 0.8333,
      "metric_scores": {
        "has_help": { "pass": true, "score": 1.0, "detail": "text contains: \"help\"" },
        "reasonable_length": { "pass": true, "score": 1.0, "detail": "length 142 (range: 20–1000)" }
      },
      "duration_ms": 823
    }
  ]
}

Metric Types

TypeConfig FieldsDescription
containsvalueResponse text contains the given string (case-insensitive)
exact_matchvalueResponse text exactly matches the given string
json_fieldfield, valueResponse JSON has a field equal to value (dot notation)
length_rangemin, maxResponse text length is within [min, max]
status_codevalueHTTP response status code equals value
keywordsvalues[], thresholdRatio of keywords present ≥ threshold (default 1.0)
regexpattern, flagsResponse text matches regex pattern
latency_undervalueAgent response time ≤ value ms

Other Endpoints

GET/api/cohorts

List all cohorts with fixture count, agent count, run count, and latest score.

GET/api/cohorts/:id

Get a cohort with its fixtures, agent configs, and recent runs.

DELETE/api/cohorts/:id

Delete a cohort and all its data.

GET/api/runs?cohort_id=&limit=

List runs, optionally filtered by cohort. Returns up to 100.

GET/api/runs/:id

Get full run detail including per-fixture, per-agent results grouped by fixture.

GET/api/stats

Summary stats: cohort count, completed runs, pass rate, total evaluations.