The metric used for running evaluations.
Optional. The aggregation metrics to use.
metric_specUnion type
metric_spec can be only one of the following:The spec for a pre-defined metric.
Spec for an LLM based metric.
Spec for Custom code Execution metric.
Spec for pointwise metric.
Spec for pairwise metric.
Spec for exact match metric.
Spec for bleu metric.
Spec for rouge metric.
| JSON representation |
|---|
{ "aggregationMetrics": [ enum ( |
PredefinedMetricSpec
The spec for a pre-defined metric.
metricSpecNamestring
Required. The name of a pre-defined metric, such as "instruction_following_v1" or "text_quality_v1".
Optional. The parameters needed to run the pre-defined metric.
| JSON representation |
|---|
{ "metricSpecName": string, "metricSpecParameters": { object } } |
LLMBasedMetricSpec
Specification for an LLM based metric.
rubrics_sourceUnion type
rubrics_source can be only one of the following:rubricGroupKeystring
Use a pre-defined group of rubrics associated with the input. Refers to a key in the rubricGroups map of EvaluationInstance.
Dynamically generate rubrics using this specification.
Dynamically generate rubrics using a predefined spec.
metricPromptTemplatestring
Required. Template for the prompt sent to the judge model.
systemInstructionstring
Optional. System instructions for the judge model.
Optional. Optional configuration for the judge LLM (Autorater).
Optional. Optional additional configuration for the metric.
| JSON representation |
|---|
{ // rubrics_source "rubricGroupKey": string, "rubricGenerationSpec": { object ( |
RubricGenerationSpec
Specification for how rubrics should be generated.
promptTemplatestring
Template for the prompt used to generate rubrics. The details should be updated based on the most-recent recipe requirements.
The type of rubric content to be generated.
rubricTypeOntology[]string
Optional. An optional, pre-defined list of allowed types for generated rubrics. If this field is provided, it implies include_rubric_type should be true, and the generated rubric types should be chosen from this ontology.
Configuration for the model used in rubric generation. Configs including sampling count and base model can be specified here. Flipping is not supported for rubric generation.
| JSON representation |
|---|
{ "promptTemplate": string, "rubricContentType": enum ( |
RubricContentType
Specifies the type of rubric content to generate.
| Enums | |
|---|---|
RUBRIC_CONTENT_TYPE_UNSPECIFIED |
The content type to generate is not specified. |
PROPERTY |
Generate rubrics based on properties. |
NL_QUESTION_ANSWER |
Generate rubrics in an NL question answer format. |
PYTHON_CODE_ASSERTION |
Generate rubrics in a unit test format. |
CustomCodeExecutionSpec
Specificies a metric that is populated by evaluating user-defined Python code.
evaluationFunctionstring
Required. Python function. Expected user to define the following function, e.g.: def evaluate(instance: dict[str, Any]) -> float: Please include this function signature in the code snippet. Instance is the evaluation instance, any fields populated in the instance are available to the function as instance[fieldName].
Example: Example input:
instance= EvaluationInstance(
response=EvaluationInstance.InstanceData(text="The answer is 4."),
reference=EvaluationInstance.InstanceData(text="4")
)
Example converted input:
{
'response': {'text': 'The answer is 4.'},
'reference': {'text': '4'}
}
Example python function:
def evaluate(instance: dict[str, Any]) -> float:
if instance['response']['text'] == instance['reference']['text']:
return 1.0
return 0.0
| JSON representation |
|---|
{ "evaluationFunction": string } |
PointwiseMetricSpec
Spec for pointwise metric.
Optional. CustomOutputFormatConfig allows customization of metric output. By default, metrics return a score and explanation. When this config is set, the default output is replaced with either: - The raw output string. - A parsed output based on a user-defined schema. If a custom format is chosen, the score and explanation fields in the corresponding metric result will be empty.
metricPromptTemplatestring
Required. Metric prompt template for pointwise metric.
systemInstructionstring
Optional. System instructions for pointwise metric.
| JSON representation |
|---|
{
"customOutputFormatConfig": {
object ( |
CustomOutputFormatConfig
Spec for custom output format configuration.
custom_output_format_configUnion type
custom_output_format_config can be only one of the following:returnRawOutputboolean
Optional. Whether to return raw output.
| JSON representation |
|---|
{ // custom_output_format_config "returnRawOutput": boolean // Union type } |
PairwiseMetricSpec
Spec for pairwise metric.
candidateResponseFieldNamestring
Optional. The field name of the candidate response.
baselineResponseFieldNamestring
Optional. The field name of the baseline response.
Optional. CustomOutputFormatConfig allows customization of metric output. When this config is set, the default output is replaced with the raw output string. If a custom format is chosen, the pairwiseChoice and explanation fields in the corresponding metric result will be empty.
metricPromptTemplatestring
Required. Metric prompt template for pairwise metric.
systemInstructionstring
Optional. System instructions for pairwise metric.
| JSON representation |
|---|
{
"candidateResponseFieldName": string,
"baselineResponseFieldName": string,
"customOutputFormatConfig": {
object ( |
ExactMatchSpec
This type has no fields.
Spec for exact match metric - returns 1 if prediction and reference exactly matches, otherwise 0.
BleuSpec
Spec for bleu score metric - calculates the precision of n-grams in the prediction as compared to reference - returns a score ranging between 0 to 1.
useEffectiveOrderboolean
Optional. Whether to useEffectiveOrder to compute bleu score.
| JSON representation |
|---|
{ "useEffectiveOrder": boolean } |
RougeSpec
Spec for rouge score metric - calculates the recall of n-grams in prediction as compared to reference - returns a score ranging between 0 and 1.
rougeTypestring
Optional. Supported rouge types are rougen[1-9], rougeL, and rougeLsum.
useStemmerboolean
Optional. Whether to use stemmer to compute rouge score.
splitSummariesboolean
Optional. Whether to split summaries while using rougeLsum.
| JSON representation |
|---|
{ "rougeType": string, "useStemmer": boolean, "splitSummaries": boolean } |