Description
Given a single Sudoku puzzle, write a prompt that instructs an LLM to solve it. Most puzzles follow standard Sudoku rules (each row, column, and subgrid must contain all required digits) and may vary in size (4×4, 6×6, or 9×9). Some puzzles may also include additional variant constraints (e.g., diagonals, knight/king moves, thermo, killer cages). The LLM should read the rules, solve the puzzle, and return only the final grid in the required format without using any tools or code.
Evaluation
Submissions are evaluated on Sakana AI's Sudoku-Bench curated dataset of 100 puzzles, a benchmark for creative, human-like problem solving. Each problem is supplied to the LLM in the following format:
The expected output format is a flattened string with no spaces or newlines, wrapped in <answer>
tags. For example, the 4×4 solution
must be returned as
Each output will be scored as the proportion of correct digits capped at 80%, and an additional 20% for full correctness. The final score will be the mean of all 100 scores:
Example
Submission Requirements
- Standard rules apply.
- The final output should be just the flattened solution inside
<answer>
tags. - Maximum input length of 16384 characters.
- Maximum output of 32768 tokens.
- No tool-calling.