xgrammar.GrammarMatcher¶
- class xgrammar.GrammarMatcher(compiled_grammar: CompiledGrammar, *, override_stop_tokens: Optional[Union[List[int], int]] = None, terminate_without_stop_token: bool = False, max_rollback_tokens: int = 0)[source]¶
Bases:
XGRObject
Match the output of the LLM to the specified grammar, then generate the mask for the next token. This is the core class in the grammar-guided generation.
This class maintains a stateful matcher that can accept tokens and strings, then match them to the specified grammar. The matcher can provide a bitmask for the next token prediction, so that the output of the LLM follows the specified grammar. Its state can be reset and rolled back by tokens. It also provides utilities for jump-forward decoding.
After matching the whole grammar, the matcher will accept a stop token. The token mask at this time will only allow stop tokens. After accepting the stop token, the matcher will terminate, then it cannot accept any new token or generate a new token mask, meaning the generation is finished.
Under the hood, it utilizes a pushdown automaton with backtracking to match the grammar, with optimizations specific to LLM token mask generation.
- Parameters:
compiled_grammar (CompiledGrammar) – The initialization context for the grammar matcher.
override_stop_tokens (Optional[Union[int, List[int]]], default: None) – If not None, the stop tokens to override the ones in the grammar.
terminate_without_stop_token (bool, default: False) – Whether to terminate the matcher without accepting a stop token.
max_rollback_tokens (int, default: 0) – The maximum number of rollback tokens allowed. The rollback operation is useful for jump-forward decoding and speculative decoding.
- __init__(compiled_grammar: CompiledGrammar, *, override_stop_tokens: Optional[Union[List[int], int]] = None, terminate_without_stop_token: bool = False, max_rollback_tokens: int = 0) None [source]¶
Methods
__init__
(compiled_grammar, *[, ...])accept_string
(input_str, *[, debug_print])Accept a string and update the state of the matcher.
accept_token
(token_id, *[, debug_print])Accept one token and update the state of the matcher.
fill_next_token_bitmask
(bitmask[, index, ...])Fill the bitmask for the next token prediction.
Find the jump-forward string for jump-forward decoding.
Check if the matcher has terminated.
reset
()Reset the matcher to the initial state.
rollback
([num_tokens])Rollback the matcher to a previous state by several tokens.
Attributes
Get the maximum number of rollback tokens allowed.
The ids of the stop tokens used in the matcher.
- accept_string(input_str: Union[str, bytes], *, debug_print: bool = False) bool [source]¶
Accept a string and update the state of the matcher. The whole string is considered as one step in rollback. It is used to complement the functionality of accept_token, and accept_token should always be used to accept tokens.
- Parameters:
- Returns:
accepted – Whether the string is accepted.
- Return type:
- Raises:
RuntimeError – If the recursion depth is exceeded.
- accept_token(token_id: int, *, debug_print: bool = False) bool [source]¶
Accept one token and update the state of the matcher.
In the following cases, the matcher will not accept the token and return False:
The token does not match the grammar.
The matcher has terminated after accepting the stop token, but is trying to accept a new token.
The token id is out of range.
The token is a special token.
The user should capture the return value and handle the cases where the token is not accepted.
- Parameters:
- Returns:
accepted – Whether the token is accepted.
- Return type:
- Raises:
RuntimeError – If the recursion depth is exceeded.
- fill_next_token_bitmask(bitmask: Tensor, index: int = 0, *, debug_print: bool = False) bool [source]¶
Fill the bitmask for the next token prediction. The input bitmask can be generated by allocate_token_bitmask, and must be on CPU. bitmask[index] will be filled with the next token bitmask.
This method does not change the matcher state.
- Parameters:
bitmask (torch.Tensor) – The bitmask for the next token prediction.
index (int, default: 0) – The batch id of the bitmask.
debug_print (bool, default: False) – Whether to print information about generated bitmask. Helpful for debugging.
- Returns:
need_apply – Whether the bitmask need to be applied (not all-true). An optimization: if False, this means the bitmask is already all-true, so no need to apply it.
- Return type:
- Raises:
RuntimeError – If the recursion depth is exceeded.
- find_jump_forward_string() str [source]¶
Find the jump-forward string for jump-forward decoding. This is the longest string that certainly conforms with the current grammar from the current matcher state. This string can become the output of the LLM without requiring LLM decoding.
This method does not change the matcher state.
- Returns:
jump_forward_string – The jump-forward string.
- Return type:
- Raises:
RuntimeError – If the recursion depth is exceeded.
- is_terminated() bool [source]¶
Check if the matcher has terminated. If terminate_without_stop_token is False, the matcher will terminate if it has accepted the stop token. Otherwise, the matcher will terminate after matching the whole grammar.
- Returns:
terminated – Whether the matcher has terminated.
- Return type:
- property max_rollback_tokens: int¶
Get the maximum number of rollback tokens allowed.
- Returns:
max_rollback_tokens – The maximum number of rollback tokens.
- Return type: