API Reference

ScoreGuard exposes a simple REST API for running evaluations and querying results. All endpoints accept and return JSON.

Endpoints

POST/api/run

Execute a cohort evaluation. Provide a cohort_name with inline fixtures and agent_configs to create on the fly, or pass a cohort_id to re-run a saved cohort.

Request Body

Field	Type	Description
cohort_id	number	Re-run a saved cohort by ID (skips fixtures/agents fields)
cohort_name	string	Name for a new cohort (required if no cohort_id)
fixtures	array	Array of fixture objects (required for new cohorts)
agent_configs	array	Array of agent config objects (required for new cohorts)

Fixture Object

Field	Type	Description
name	string	Human-readable fixture name
input	object	JSON payload sent to the agent endpoint
expected	object	Expected values used by metric evaluators
metrics	array	Metric evaluator configs (see Metrics section)
pass_threshold	number	Weighted score threshold for pass/fail (default: 0.5)

Agent Config Object

Field	Type	Description
name	string	Human-readable agent name
endpoint_url	string	URL to POST fixture inputs to
headers	object	HTTP headers (e.g. Authorization)
timeout_ms	number	Per-call timeout in ms (default: 30000)

// Example: run new cohort inline
curl -X POST https://scoreguard.polsia.app/api/run \
  -H "Content-Type: application/json" \
  -d '{
  "cohort_name": "onboarding-v3",
  "fixtures": [
    {
      "name": "greeting_test",
      "input": { "message": "Hello, help me get started" },
      "expected": { "value": "help" },
      "metrics": [
        { "type": "contains", "name": "has_help", "value": "help", "weight": 1.0 },
        { "type": "length_range", "name": "reasonable_length", "min": 20, "max": 1000, "weight": 0.5 }
      ],
      "pass_threshold": 0.7
    }
  ],
  "agent_configs": [
    {
      "name": "agent-v1",
      "endpoint_url": "https://your-api.com/respond",
      "headers": { "Authorization": "Bearer sk-..." },
      "timeout_ms": 15000
    }
  ]
}'

Response

{
  "run_id": 42,
  "cohort_id": 7,
  "status": "completed",
  "total_fixtures": 1,
  "passed": 1,
  "failed": 0,
  "error_count": 0,
  "score": 0.8333,
  "baseline_score": 0.91,
  "drift": -0.0767,
  "results": [
    {
      "fixture_name": "greeting_test",
      "agent_name": "agent-v1",
      "status": "pass",
      "score": 0.8333,
      "metric_scores": {
        "has_help": { "pass": true, "score": 1.0, "detail": "text contains: \"help\"" },
        "reasonable_length": { "pass": true, "score": 1.0, "detail": "length 142 (range: 20–1000)" }
      },
      "duration_ms": 823
    }
  ]
}

Metric Types

Type	Config Fields	Description
contains	value	Response text contains the given string (case-insensitive)
exact_match	value	Response text exactly matches the given string
json_field	field, value	Response JSON has a field equal to value (dot notation)
length_range	min, max	Response text length is within [min, max]
status_code	value	HTTP response status code equals value
keywords	values[], threshold	Ratio of keywords present ≥ threshold (default 1.0)
regex	pattern, flags	Response text matches regex pattern
latency_under	value	Agent response time ≤ value ms

Other Endpoints

GET/api/cohorts

List all cohorts with fixture count, agent count, run count, and latest score.

GET/api/cohorts/:id

Get a cohort with its fixtures, agent configs, and recent runs.

DELETE/api/cohorts/:id

Delete a cohort and all its data.

GET/api/runs?cohort_id=&limit=

List runs, optionally filtered by cohort. Returns up to 100.

GET/api/runs/:id

Get full run detail including per-fixture, per-agent results grouped by fixture.

GET/api/stats

Summary stats: cohort count, completed runs, pass rate, total evaluations.