Documentation Index Fetch the complete documentation index at: https://mintlify.com/JanuaryLabs/deepagents/llms.txt
Use this file to discover all available pages before exploring further.
Scorers API
Scorers evaluate the quality of LLM outputs. All scorers implement the Scorer type.
Import
import {
exactMatch ,
includes ,
regex ,
levenshtein ,
jsonMatch ,
factuality ,
all ,
any ,
weighted ,
} from '@deepagents/evals/scorers' ;
Types
Scorer
type Scorer = ( args : ScorerArgs ) => Promise < ScorerResult >;
ScorerArgs
interface ScorerArgs {
input : unknown ; // Original input from dataset
output : string ; // Model output to score
expected ?: unknown ; // Expected value from dataset
}
ScorerResult
interface ScorerResult {
score : number ; // 0..1 (0 = worst, 1 = best)
reason ?: string ; // Human-readable explanation
metadata ?: Record < string , unknown >; // Additional scoring metadata
}
Deterministic Scorers
exactMatch
Strict string equality:
const result = await exactMatch ({
input: 'What is 2+2?' ,
output: '4' ,
expected: '4' ,
});
// { score: 1.0 }
Returns:
1.0 if output === String(expected)
0.0 otherwise, with a reason explaining the mismatch
includes
Substring check:
const result = await includes ({
input: 'What is the capital of France?' ,
output: 'The capital of France is Paris.' ,
expected: 'Paris' ,
});
// { score: 1.0 }
Returns:
1.0 if output.includes(String(expected))
0.0 otherwise
regex(pattern)
Regular expression test:
const emailScorer = regex ( / ^ [ a-z0-9._%+- ] + @ [ a-z0-9.- ] + \. [ a-z ] {2,} $ / i );
const result = await emailScorer ({
input: 'Extract the email' ,
output: 'user@example.com' ,
});
// { score: 1.0 }
Signature:
function regex ( pattern : RegExp ) : Scorer ;
Returns:
1.0 if pattern.test(output)
0.0 otherwise
levenshtein
Normalized edit distance similarity:
const result = await levenshtein ({
input: 'Spell "hello"' ,
output: 'helo' ,
expected: 'hello' ,
});
// { score: 0.8, reason: '...', metadata: { ... } }
Returns:
1.0 for exact match
0.0 for completely different strings
Decimal between 0 and 1 for partial similarity
Includes reason and metadata from autoevals
jsonMatch
Deep structural equality for JSON:
const result = await jsonMatch ({
input: 'Generate JSON' ,
output: '{"name":"Alice","age":30}' ,
expected: { name: 'Alice' , age: 30 },
});
// { score: 1.0 }
Returns:
1.0 if JSON structures are deeply equal
0.0 if structures differ or JSON is invalid
Notes:
Object key order doesn’t matter
Array order matters
expected can be a string or an object
LLM-Based Scorers
factuality(config)
Checks if output is factually correct:
const factScorer = factuality ({ model: 'gpt-4o-mini' });
const result = await factScorer ({
input: 'What is the capital of France?' ,
output: 'Paris is the capital and largest city of France.' ,
expected: 'Paris' ,
});
// { score: 1.0, reason: 'Output is factually correct', metadata: { ... } }
Signature:
function factuality ( config : { model : string }) : Scorer ;
Config:
{
model : string ; // OpenAI-compatible model ID (e.g., 'gpt-4o-mini')
}
Returns:
1.0 if output is factually consistent with expected
0.0 if output contradicts expected
Decimal between 0 and 1 for partial correctness
reason field contains LLM’s explanation
metadata includes additional details from autoevals
Requirements:
OPENAI_API_KEY environment variable
OpenAI-compatible API endpoint
Combinators
all(...scorers)
Weakest-link (minimum score):
const strict = all ( exactMatch , includes );
const result = await strict ({
input: 'What is 2+2?' ,
output: '4' ,
expected: '4' ,
});
// { score: 1.0 } (both scorers passed)
Signature:
function all ( ... scorers : Scorer []) : Scorer ;
Returns:
score: Minimum score of all scorers
reason: Concatenated reasons from all scorers (semicolon-separated)
any(...scorers)
Best-of (maximum score):
const lenient = any ( exactMatch , includes );
const result = await lenient ({
input: 'What is the capital of France?' ,
output: 'The capital is Paris.' ,
expected: 'Paris' ,
});
// { score: 1.0 } (includes passed, even though exactMatch failed)
Signature:
function any ( ... scorers : Scorer []) : Scorer ;
Returns:
score: Maximum score of all scorers
reason: Reason from the highest-scoring scorer
weighted(config)
Weighted average:
const balanced = weighted ({
accuracy: { scorer: exactMatch , weight: 2 },
grounding: { scorer: factuality ({ model: 'gpt-4o-mini' }), weight: 1 },
});
const result = await balanced ({
input: 'What is 2+2?' ,
output: '4' ,
expected: '4' ,
});
// { score: 1.0, reason: 'accuracy: 1.00 (w=2), grounding: 1.00 (w=1)' }
Signature:
function weighted (
config : Record < string , { scorer : Scorer ; weight : number }>
) : Scorer ;
Config:
{
[ name : string ]: {
scorer: Scorer ;
weight : number ;
}
}
Returns:
score: Weighted average sum(score * weight) / sum(weight)
reason: Lists all scorer scores and weights
Custom Scorers
Create custom scorers by implementing the Scorer type:
import type { Scorer } from '@deepagents/evals/scorers' ;
const lengthScorer : Scorer = async ({ output }) => {
const score = output . length > 10 ? 1.0 : 0.5 ;
return {
score ,
reason: `Output length: ${ output . length } ` ,
};
};
Requirements:
Return a Promise<ScorerResult>
Score must be between 0 and 1
Optionally include reason and metadata
Examples
Using Multiple Scorers
import { evaluate , exactMatch , includes } from '@deepagents/evals' ;
await evaluate ({
// ...
scorers: {
exact: exactMatch ,
contains: includes ,
},
});
A case passes if all scorers return >= threshold.
Combining Scorers
import { all , any , weighted , exactMatch , includes , factuality } from '@deepagents/evals/scorers' ;
// All must pass
const strict = all ( exactMatch , includes );
// At least one must pass
const lenient = any ( exactMatch , includes );
// Weighted combination
const balanced = weighted ({
accuracy: { scorer: exactMatch , weight: 2 },
grounding: { scorer: factuality ({ model: 'gpt-4o-mini' }), weight: 1 },
});
Custom Scorer
import type { Scorer } from '@deepagents/evals/scorers' ;
const containsKeyword : Scorer = async ({ output }) => {
const keywords = [ 'paris' , 'france' , 'capital' ];
const matches = keywords . filter (( k ) => output . toLowerCase (). includes ( k ));
return {
score: matches . length / keywords . length ,
reason: `Matched ${ matches . length } / ${ keywords . length } keywords` ,
};
};
Next Steps
Evaluate API Learn about the evaluate() function
Scorers Guide Scorer usage guide