xgr.Grammar¶

class xgrammar.Grammar[source]¶

This class represents a grammar object in XGrammar, and can be used later in the grammar-guided generation.

The Grammar object supports context-free grammar (CFG). EBNF (extended Backus-Naur Form) is used as the format of the grammar. There are many specifications for EBNF in the literature, and we follow the specification of GBNF (GGML BNF) in https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md.

When printed, the grammar will be converted to GBNF format.

Methods:

`__str__`()	Print the BNF grammar to a string, in EBNF format.
`from_ebnf`(ebnf_string, *[, root_rule_name])	Construct a grammar from EBNF string.
`from_json_schema`(schema, *[, ...])	Construct a grammar from JSON schema.
`from_regex`(regex_string, *[, ...])	Create a grammar from a regular expression string.
`from_structural_tag`()	Create a grammar from a structural tag.
`builtin_json_grammar`()	Get the grammar of standard JSON.
`concat`(*grammars)	Create a grammar that matches the concatenation of the grammars in the list.
`union`(*grammars)	Create a grammar that matches any of the grammars in the list.
`serialize_json`()	Serialize the grammar to a JSON string.
`deserialize_json`(json_string)	Deserialize a grammar from a JSON string.

__str__() → str[source]¶

Print the BNF grammar to a string, in EBNF format.

Returns:: grammar_string – The BNF grammar string.
Return type:: str

static from_ebnf(ebnf_string: str, *, root_rule_name: str = 'root') → Grammar[source]¶

Construct a grammar from EBNF string. The EBNF string should follow the format in https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md.

Parameters:

ebnf_string (str) – The grammar string in EBNF format.
root_rule_name (str, default: "root") – The name of the root rule in the grammar.

Raises:

RuntimeError – When converting the regex pattern fails, with details about the parsing error.

static from_json_schema(schema: Union[str, Type[BaseModel], Dict[str, Any]], *, any_whitespace: bool = True, indent: Optional[int] = None, separators: Optional[Tuple[str, str]] = None, strict_mode: bool = True, max_whitespace_cnt: Optional[int] = None, print_converted_ebnf: bool = False) → Grammar[source]¶

Construct a grammar from JSON schema. Pydantic model or JSON schema string can be used to specify the schema.

It allows any whitespace by default. If user want to specify the format of the JSON, set any_whitespace to False and use the indent and separators parameters. The meaning and the default values of the parameters follows the convention in json.dumps().

It internally converts the JSON schema to a EBNF grammar.

Parameters:

schema (Union[str, Type[BaseModel], Dict[str, Any]]) – The schema string or Pydantic model or JSON schema dict.
any_whitespace (bool, default: True) – Whether to use any whitespace. If True, the generated grammar will ignore the indent and separators parameters, and allow any whitespace.
indent (Optional[int], default: None) –
The number of spaces for indentation. If None, the output will be in one line.

Note that specifying the indentation means forcing the LLM to generate JSON strings strictly formatted. However, some models may tend to generate JSON strings that are not strictly formatted. In this case, forcing the LLM to generate strictly formatted JSON strings may degrade the generation quality. See <https://github.com/sgl-project/sglang/issues/2216#issuecomment-2516192009> for more details.
separators (Optional[Tuple[str, str]], default: None) – Two separators used in the schema: comma and colon. Examples: (“,”, “:”), (”, “, “: “). If None, the default separators will be used: (“,”, “: “) when the indent is not None, and (”, “, “: “) otherwise.
strict_mode (bool, default: True) –
Whether to use strict mode. In strict mode, the generated grammar will not allow properties and items that is not specified in the schema. This is equivalent to setting unevaluatedProperties and unevaluatedItems to false.

This helps LLM to generate accurate output in the grammar-guided generation with JSON schema.
max_whitespace_cnt (Optional[int], default: None) – The maximum number of whitespace characters allowed between elements, such like keys, values, separators and so on. If None, there is no limit on the number of whitespace characters. If specified, it will limit the number of whitespace characters to at most max_whitespace_cnt. It should be a positive integer.
print_converted_ebnf (bool, default: False) – If True, the converted EBNF string will be printed. For debugging purposes.

Returns:

grammar – The constructed grammar.

Return type:

Grammar

Raises:

RuntimeError – When converting the json schema fails, with details about the parsing error.

static from_regex(regex_string: str, *, print_converted_ebnf: bool = False) → Grammar[source]¶

Create a grammar from a regular expression string.

Parameters:

regex_string (str) – The regular expression pattern to create the grammar from.
print_converted_ebnf (bool, default: False) – This method will convert the regex pattern to EBNF first. If this is true, the converted EBNF string will be printed. For debugging purposes. Default: False.

Returns:

grammar – The constructed grammar from the regex pattern.

Return type:

Grammar

Raises:

RuntimeError – When parsing the regex pattern fails, with details about the parsing error.

static from_structural_tag(structural_tag: Union[StructuralTag, str, Dict[str, Any]]) → Grammar[source]¶

static from_structural_tag(tags: List[StructuralTagItem], triggers: List[str]) → Grammar

Create a grammar from a structural tag. See the Structural Tag Usage in XGrammar documentation for its usage.

This method supports two calling patterns:

Single structural tag parameter: from_structural_tag(structural_tag)
Legacy pattern (deprecated): from_structural_tag(tags, triggers)

Parameters:

structural_tag (Union[StructuralTag, str, Dict[str, Any]]) – The structural tag either as a StructuralTag object, or a JSON string or a dictionary.
tags (List[StructuralTagItem]) – (Deprecated) The structural tags. Use StructuralTag class instead.
triggers (List[str]) – (Deprecated) The triggers. Use StructuralTag class instead.

Returns:

grammar – The constructed grammar from the structural tag.

Return type:

Grammar

Raises:

InvalidJSONError – When the structural tag is not a valid JSON string.
InvalidStructuralTagError – When the structural tag is not valid.
TypeError – When the arguments are invalid.

Notes

The legacy pattern from_structural_tag(tags, triggers) is deprecated. Use the StructuralTag class to construct structural tags instead.

For the deprecated pattern: The structural tag handles the dispatching of different grammars based on the tags and triggers: it initially allows any output, until a trigger is encountered, then dispatch to the corresponding tag; when the end tag is encountered, the grammar will allow any following output, until the next trigger is encountered. See the Advanced Topics of the Structural Tag in XGrammar documentation for its semantic. Structural Tag in XGrammar documentation for its semantic.

static builtin_json_grammar() → Grammar[source]¶

Get the grammar of standard JSON. This is compatible with the official JSON grammar specification in https://www.json.org/json-en.html.

Returns:: grammar – The JSON grammar.
Return type:: Grammar

static concat(*grammars: Grammar) → Grammar[source]¶

Create a grammar that matches the concatenation of the grammars in the list. That is equivalent to using the + operator to concatenate the grammars in the list.

Parameters:: grammars (List[Grammar]) – The grammars to create the concatenation of.
Returns:: grammar – The concatenation of the grammars.
Return type:: Grammar

static union(*grammars: Grammar) → Grammar[source]¶

Create a grammar that matches any of the grammars in the list. That is equivalent to using the | operator to concatenate the grammars in the list.

Parameters:: grammars (List[Grammar]) – The grammars to create the union of.
Returns:: grammar – The union of the grammars.
Return type:: Grammar

serialize_json() → str[source]¶

Serialize the grammar to a JSON string.

Returns:: json_string – The JSON string.
Return type:: str

static deserialize_json(json_string: str) → Grammar[source]¶

Deserialize a grammar from a JSON string.

Parameters:

json_string (str) – The JSON string.

Returns:

grammar – The deserialized grammar.

Return type:

Grammar

Raises:

InvalidJSONError – When the JSON string is invalid.
DeserializeFormatError – When the JSON string does not follow the serialization format of the grammar.
DeserializeVersionError – When the __VERSION__ field in the JSON string is not the same as the current version.