Metric

The metric used for running evaluations.

Fields

aggregationMetrics[] enum (AggregationMetric)

Optional. The aggregation metrics to use.

metric_spec Union type

The spec for the metric. It would be either a pre-defined metric, or a inline metric spec. metric_spec can be only one of the following:

predefinedMetricSpec object (PredefinedMetricSpec)

The spec for a pre-defined metric.

llmBasedMetricSpec object (LLMBasedMetricSpec)

Spec for an LLM based metric.

customCodeExecutionSpec object (CustomCodeExecutionSpec)

Spec for Custom code Execution metric.

pointwiseMetricSpec object (PointwiseMetricSpec)

Spec for pointwise metric.

pairwiseMetricSpec object (PairwiseMetricSpec)

Spec for pairwise metric.

exactMatchSpec object (ExactMatchSpec)

Spec for exact match metric.

bleuSpec object (BleuSpec)

Spec for bleu metric.

rougeSpec object (RougeSpec)

Spec for rouge metric.

JSON representation

JSON representation
{ "aggregationMetrics": [ enum (`AggregationMetric`) ], // metric_spec "predefinedMetricSpec": { object (`PredefinedMetricSpec`) }, "llmBasedMetricSpec": { object (`LLMBasedMetricSpec`) }, "customCodeExecutionSpec": { object (`CustomCodeExecutionSpec`) }, "pointwiseMetricSpec": { object (`PointwiseMetricSpec`) }, "pairwiseMetricSpec": { object (`PairwiseMetricSpec`) }, "exactMatchSpec": { object (`ExactMatchSpec`) }, "bleuSpec": { object (`BleuSpec`) }, "rougeSpec": { object (`RougeSpec`) } // Union type }

{
  "aggregationMetrics": [
    enum (AggregationMetric)
  ],

  // metric_spec
  "predefinedMetricSpec": {
    object (PredefinedMetricSpec)
  },
  "llmBasedMetricSpec": {
    object (LLMBasedMetricSpec)
  },
  "customCodeExecutionSpec": {
    object (CustomCodeExecutionSpec)
  },
  "pointwiseMetricSpec": {
    object (PointwiseMetricSpec)
  },
  "pairwiseMetricSpec": {
    object (PairwiseMetricSpec)
  },
  "exactMatchSpec": {
    object (ExactMatchSpec)
  },
  "bleuSpec": {
    object (BleuSpec)
  },
  "rougeSpec": {
    object (RougeSpec)
  }
  // Union type
}

PredefinedMetricSpec

The spec for a pre-defined metric.

Fields

metricSpecName string

Required. The name of a pre-defined metric, such as "instruction_following_v1" or "text_quality_v1".

metricSpecParameters object (Struct format)

Optional. The parameters needed to run the pre-defined metric.

JSON representation
{ "metricSpecName": string, "metricSpecParameters": { object } }

LLMBasedMetricSpec

Specification for an LLM based metric.

Fields

rubrics_source Union type

Source of the rubrics to be used for evaluation. rubrics_source can be only one of the following:

rubricGroupKey string

Use a pre-defined group of rubrics associated with the input. Refers to a key in the rubricGroups map of EvaluationInstance.

rubricGenerationSpec object (RubricGenerationSpec)

Dynamically generate rubrics using this specification.

predefinedRubricGenerationSpec object (PredefinedMetricSpec)

Dynamically generate rubrics using a predefined spec.

metricPromptTemplate string

Required. Template for the prompt sent to the judge model.

systemInstruction string

Optional. System instructions for the judge model.

judgeAutoraterConfig object (AutoraterConfig)

Optional. Optional configuration for the judge LLM (Autorater).

additionalConfig object (Struct format)

Optional. Optional additional configuration for the metric.

JSON representation

JSON representation
{ // rubrics_source "rubricGroupKey": string, "rubricGenerationSpec": { object (`RubricGenerationSpec`) }, "predefinedRubricGenerationSpec": { object (`PredefinedMetricSpec`) } // Union type "metricPromptTemplate": string, "systemInstruction": string, "judgeAutoraterConfig": { object (`AutoraterConfig`) }, "additionalConfig": { object } }

{

  // rubrics_source
  "rubricGroupKey": string,
  "rubricGenerationSpec": {
    object (RubricGenerationSpec)
  },
  "predefinedRubricGenerationSpec": {
    object (PredefinedMetricSpec)
  }
  // Union type
  "metricPromptTemplate": string,
  "systemInstruction": string,
  "judgeAutoraterConfig": {
    object (AutoraterConfig)
  },
  "additionalConfig": {
    object
  }
}

RubricGenerationSpec

Specification for how rubrics should be generated.

Fields

promptTemplate string

Template for the prompt used to generate rubrics. The details should be updated based on the most-recent recipe requirements.

rubricContentType enum (RubricContentType)

The type of rubric content to be generated.

rubricTypeOntology[] string

Optional. An optional, pre-defined list of allowed types for generated rubrics. If this field is provided, it implies include_rubric_type should be true, and the generated rubric types should be chosen from this ontology.

modelConfig object (AutoraterConfig)

Configuration for the model used in rubric generation. Configs including sampling count and base model can be specified here. Flipping is not supported for rubric generation.

JSON representation
{ "promptTemplate": string, "rubricContentType": enum (`RubricContentType`), "rubricTypeOntology": [ string ], "modelConfig": { object (`AutoraterConfig`) } }

RubricContentType

Specifies the type of rubric content to generate.

Enums
`RUBRIC_CONTENT_TYPE_UNSPECIFIED`	The content type to generate is not specified.
`PROPERTY`	Generate rubrics based on properties.
`NL_QUESTION_ANSWER`	Generate rubrics in an NL question answer format.
`PYTHON_CODE_ASSERTION`	Generate rubrics in a unit test format.

CustomCodeExecutionSpec

Specificies a metric that is populated by evaluating user-defined Python code.

Fields

evaluationFunction string

Required. Python function. Expected user to define the following function, e.g.: def evaluate(instance: dict[str, Any]) -> float: Please include this function signature in the code snippet. Instance is the evaluation instance, any fields populated in the instance are available to the function as instance[fieldName].

Example: Example input:

instance= EvaluationInstance( response=EvaluationInstance.InstanceData(text="The answer is 4."), reference=EvaluationInstance.InstanceData(text="4") )

Example converted input:

{ 'response': {'text': 'The answer is 4.'}, 'reference': {'text': '4'} }

Example python function:

def evaluate(instance: dict[str, Any]) -> float: if instance['response']['text'] == instance['reference']['text']: return 1.0 return 0.0

JSON representation
{ "evaluationFunction": string }

PointwiseMetricSpec

Spec for pointwise metric.

Fields

customOutputFormatConfig object (CustomOutputFormatConfig)

Optional. CustomOutputFormatConfig allows customization of metric output. By default, metrics return a score and explanation. When this config is set, the default output is replaced with either: - The raw output string. - A parsed output based on a user-defined schema. If a custom format is chosen, the score and explanation fields in the corresponding metric result will be empty.

metricPromptTemplate string

Required. Metric prompt template for pointwise metric.

systemInstruction string

Optional. System instructions for pointwise metric.

JSON representation
{ "customOutputFormatConfig": { object (`CustomOutputFormatConfig`) }, "metricPromptTemplate": string, "systemInstruction": string }

CustomOutputFormatConfig

Spec for custom output format configuration.

Fields

custom_output_format_config Union type

Custom output format configuration. custom_output_format_config can be only one of the following:

returnRawOutput boolean

Optional. Whether to return raw output.

JSON representation
{ // custom_output_format_config "returnRawOutput": boolean // Union type }

PairwiseMetricSpec

Spec for pairwise metric.

Fields

candidateResponseFieldName string

Optional. The field name of the candidate response.

baselineResponseFieldName string

Optional. The field name of the baseline response.

customOutputFormatConfig object (CustomOutputFormatConfig)

Optional. CustomOutputFormatConfig allows customization of metric output. When this config is set, the default output is replaced with the raw output string. If a custom format is chosen, the pairwiseChoice and explanation fields in the corresponding metric result will be empty.

metricPromptTemplate string

Required. Metric prompt template for pairwise metric.

systemInstruction string

Optional. System instructions for pairwise metric.

JSON representation

JSON representation
{ "candidateResponseFieldName": string, "baselineResponseFieldName": string, "customOutputFormatConfig": { object (`CustomOutputFormatConfig`) }, "metricPromptTemplate": string, "systemInstruction": string }

{
  "candidateResponseFieldName": string,
  "baselineResponseFieldName": string,
  "customOutputFormatConfig": {
    object (CustomOutputFormatConfig)
  },
  "metricPromptTemplate": string,
  "systemInstruction": string
}

ExactMatchSpec

This type has no fields.

Spec for exact match metric - returns 1 if prediction and reference exactly matches, otherwise 0.

BleuSpec

Spec for bleu score metric - calculates the precision of n-grams in the prediction as compared to reference - returns a score ranging between 0 to 1.

Fields

useEffectiveOrder boolean

Optional. Whether to useEffectiveOrder to compute bleu score.

JSON representation
{ "useEffectiveOrder": boolean }

RougeSpec

Spec for rouge score metric - calculates the recall of n-grams in prediction as compared to reference - returns a score ranging between 0 and 1.

Fields

rougeType string

Optional. Supported rouge types are rougen[1-9], rougeL, and rougeLsum.

useStemmer boolean

Optional. Whether to use stemmer to compute rouge score.

splitSummaries boolean

Optional. Whether to split summaries while using rougeLsum.

JSON representation
{ "rougeType": string, "useStemmer": boolean, "splitSummaries": boolean }