xgr.testing¶

Testing utilities.

The APIs in this module are used for testing and debugging and are prone to change. Don’t use them in production.

Functions:

`_json_schema_to_ebnf`(schema, *[, ...])	Convert JSON schema string to BNF grammar string.
`_regex_to_ebnf`(regex[, with_rule_name])	Convert a regex string to BNF grammar string.
`_ebnf_to_grammar_no_normalization`(ebnf_string)	Convert a BNF grammar string to a Grammar object without normalization.
`_get_matcher_from_grammar`(grammar, **kwargs)	Create a GrammarMatcher from a grammar.
`_is_grammar_accept_string`(grammar, input_str, *)	Check if a grammar accepts a string.
`_get_masked_tokens_from_bitmask`(bitmask, ...)	Get the ids of the rejected tokens from the bitmask.
`_is_single_token_bitmask`(bitmask, vocab_size)	Check if the bitmask is a single token bitmask.
`_bool_mask_to_bitmask`(bool_mask)	Get the bitmask from bool mask.
`_get_matcher_from_grammar_and_tokenizer_info`(grammar)	Create a GrammarMatcher from a grammar and tokenizer info.
`_get_allow_empty_rule_ids`(compiled_grammar)
`_generate_range_regex`([start, end])
`_generate_float_regex`([start, end])

Classes:

GrammarFunctor()

A utility class for transforming grammars.

xgrammar.testing._json_schema_to_ebnf(schema: Union[str, Type[BaseModel], Dict[str, Any]], *, any_whitespace: bool = True, indent: Optional[int] = None, separators: Optional[Tuple[str, str]] = None, strict_mode: bool = True) → str[source]¶

Convert JSON schema string to BNF grammar string. For test purposes.

Parameters:

schema (Union[str, Type[BaseModel], Dict[str, Any]]) – The schema string or Pydantic model or JSON schema dict.
indent (Optional[int], default: None) – The number of spaces for indentation. If None, the output will be in one line.
separators (Optional[Tuple[str, str]], default: None) – Two separators used in the schema: comma and colon. Examples: (“,”, “:”), (”, “, “: “). If None, the default separators will be used: (“,”, “: “) when the indent is not None, and (”, “, “: “) otherwise.
strict_mode (bool, default: True) –
Whether to use strict mode. In strict mode, the generated grammar will not allow properties and items that is not specified in the schema. This is equivalent to setting unevaluatedProperties and unevaluatedItems to false.

This helps LLM to generate accurate output in the grammar-guided generation with JSON schema.

Returns:

bnf_string – The BNF grammar string.

Return type:

str

xgrammar.testing._regex_to_ebnf(regex: str, with_rule_name: bool = True) → str[source]¶

Convert a regex string to BNF grammar string. For test purposes. The regex grammar follows the syntax in JavaScript (ECMA 262). Check https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions for a tutorial. Currently the following features are not supported: 1. Backreference (1) 2. non-capturing group, naming capture groups and assertions ((?…)) 3. Unicode character class escape (p{…}) 4. Word boundary (b) 5. Unicode property escapes (p{…}) 6. Quantifier with range {x,y}. Now user can just repeat the element as a workaround.

This method is primarily intended for testing and debugging purposes.

Parameters:: regex (str) – The regex string to be converted.
Returns:: bnf_string – The BNF grammar string converted from the input regex.
Return type:: str

xgrammar.testing._ebnf_to_grammar_no_normalization(ebnf_string: str, root_rule_name: str = 'root') → Grammar[source]¶

Convert a BNF grammar string to a Grammar object without normalization. For test purposes. The result grammar cannot be compiled / used in GrammarMatcher.

Parameters:: ebnf_string (str) – The BNF grammar string to be converted.
Returns:: grammar – The unnormalized Grammar object converted from the input BNF grammar string.
Return type:: Grammar

xgrammar.testing._get_matcher_from_grammar(grammar: Union[Grammar, str], **kwargs) → GrammarMatcher[source]¶

Create a GrammarMatcher from a grammar. The tokenizer info will be set to an empty TokenizerInfo. The result matcher can only accept strings, and cannot accept tokens.

Parameters:: grammar (Union[Grammar, str]) – The grammar to create the matcher from. Can be either a Grammar object or a string containing EBNF grammar.
Returns:: matcher – The created grammar matcher.
Return type:: GrammarMatcher

xgrammar.testing._is_grammar_accept_string(grammar: Union[Grammar, str], input_str: str, *, debug_print: bool = False, print_time: bool = False, require_termination: bool = True) → bool[source]¶

Check if a grammar accepts a string. For test purposes.

Parameters:

grammar (Union[Grammar, str]) – The grammar to check. Can be either a Grammar object or a BNF grammar string.
input_str (str) – The input string to check.
debug_print (bool, default: False) – Whether to print debug information during matching.
print_time (bool, default: False) – Whether to print timing information.

Returns:

True if the grammar accepts the string, False otherwise.

Return type:

bool

xgrammar.testing._get_masked_tokens_from_bitmask(bitmask: torch.Tensor, vocab_size: int, index: int = 0) → List[int][source]¶

Get the ids of the rejected tokens from the bitmask. Mainly for debug purposes.

Parameters:

bitmask (torch.Tensor) – The rejected token bitmask. Should be generated by allocate_token_bitmask and filled by fill_next_token_bitmask. Should be on CPU.
index (int, default: 0) – The batch index of the bitmask. For batch inference, bitmask[index] will be used. Otherwise is ignored.

Returns:

rejected_token_ids – A list of rejected token ids.

Return type:

List[int]

xgrammar.testing._is_single_token_bitmask(bitmask: torch.Tensor, vocab_size: int, index: int = 0) → Tuple[bool, int][source]¶

Check if the bitmask is a single token bitmask.

Parameters:

bitmask (torch.Tensor) – The bitmask to check. Should be on CPU.
vocab_size (int) – The size of the vocabulary.
index (int, default: 0) – The index of the bitmask.

Returns:

is_single_token (bool) – True if the bitmask is a single token bitmask, False otherwise.
token_id (int) – The id of the token if the bitmask is a single token bitmask, -1 otherwise.

xgrammar.testing._bool_mask_to_bitmask(bool_mask: torch.Tensor) → torch.Tensor[source]¶

Get the bitmask from bool mask. If the bool mask does not align with the 32-bit block size, it will add extra 1 paddings.

Parameters:: bool_mask (torch.Tensor) – The rejected token bool mask. For each element value, True means the token is allowed, while False means the token is rejected.
Returns:: bitmask – The rejected token bitmask.
Return type:: torch.Tensor

xgrammar.testing._get_matcher_from_grammar_and_tokenizer_info(grammar: Union[Grammar, str], tokenizer_info: Optional[TokenizerInfo] = None, **kwargs) → GrammarMatcher[source]¶

Create a GrammarMatcher from a grammar and tokenizer info.

Parameters:

grammar (Union[Grammar, str]) – The grammar to create the matcher from. Can be either a Grammar object or a string containing EBNF grammar.
tokenizer_info (Optional[TokenizerInfo], default: None) – Information about the tokenizer to use with this grammar. If None, an empty TokenizerInfo will be created.
**kwargs – Additional keyword arguments to pass to the GrammarMatcher constructor.

Returns:

matcher – The created grammar matcher.

Return type:

GrammarMatcher

xgrammar.testing._get_allow_empty_rule_ids(compiled_grammar: CompiledGrammar) → List[int][source]¶

xgrammar.testing._generate_range_regex(start: Optional[int] = None, end: Optional[int] = None) → str[source]¶

xgrammar.testing._generate_float_regex(start: Optional[float] = None, end: Optional[float] = None) → str[source]¶

class xgrammar.testing.GrammarFunctor[source]¶

Bases: object

A utility class for transforming grammars. These methods are called during grammar parsing. For test purposes.

Methods:

`structure_normalizer`(grammar)	Normalize the structure of the grammar.
`rule_inliner`(grammar)	Inline some rule references in the grammar.
`byte_string_fuser`(grammar)	Fuse the byte string elements in the grammar.
`dead_code_eliminator`(grammar)	Eliminate the not referenced rules in the grammar.
`lookahead_assertion_analyzer`(grammar)	Analyze and add lookahead assertions in the grammar.

static structure_normalizer(grammar: Grammar) → Grammar[source]¶: Normalize the structure of the grammar.

static rule_inliner(grammar: Grammar) → Grammar[source]¶: Inline some rule references in the grammar.

static byte_string_fuser(grammar: Grammar) → Grammar[source]¶: Fuse the byte string elements in the grammar.

static dead_code_eliminator(grammar: Grammar) → Grammar[source]¶: Eliminate the not referenced rules in the grammar.

static lookahead_assertion_analyzer(grammar: Grammar) → Grammar[source]¶: Analyze and add lookahead assertions in the grammar.