Sudoku | Peval Competition

Description

Given a single Sudoku puzzle, write a prompt that instructs an LLM to solve it. Most puzzles follow standard Sudoku rules (each row, column, and subgrid must contain all required digits) and may vary in size (4×4, 6×6, or 9×9). Some puzzles may also include additional variant constraints (e.g., diagonals, knight/king moves, thermo, killer cages). The LLM should read the rules, solve the puzzle, and return only the final grid in the required format without using any tools or code.

Evaluation

Submissions are evaluated on Sakana AI's Sudoku-Bench curated dataset of 100 puzzles, a benchmark for creative, human-like problem solving. Each problem is supplied to the LLM in the following format:

<submission-prompt>
{{USER_PROMPT}}
</submission-prompt>

<question>
## Initial Board

\`\`\`
{{BOARD_TO_SOLVE}}
\`\`\`

The size is {{ROWS}}x{{COLUMNS}}.

## Rules

{{RULES}}

## Visual Elements

\`\`\`json
{{VISUAL_ELEMENTS}}
\`\`\`
</question>

The expected output format is a flattened string with no spaces or newlines, wrapped in <answer> tags. For example, the 4×4 solution

must be returned as

<answer>2314142342313142</answer>

Each output will be scored as the proportion of correct digits capped at 80%, and an additional 20% for full correctness. The final score will be the mean of all 100 scores:

def evaluate(test_cases, outputs):
    total = 0
    for test_case, output in zip(test_cases, outputs):
        digits_correct = 0
        for i, digit in enumerate(test_case):
            digits_correct += i < len(output) and digit == output[i]
        total += 0.2 * (test_case == output) + 0.8 * digits_correct / len(test_case)
    return total / len(test_cases)

Example

input.txt

## Initial Board

\`\`\`
................
\`\`\`

The size is 4x4.

## Rules

Normal sudoku rules do NOT apply. Fill the grid with digits 1-9, such that no digit repeats in any row, column or box. The set of digits in each row or column is unique, eg. if a row contains the digits 1234, no other row or column may contain exactly those digits. The digits in every row, column and box sum to x, where x has to be determined by the solver. Digits separated by an X sum to 10. Digits separated by a V sum to 5. Not all Xs and Vs are necessarily given.

## Visual Elements

\`\`\`json
[{"type": "overlays", "coords": ["r1c1", "r1c2"], "loc": "vertical edge", "shape": "circle", "color_name": "white", "color_hex": "#FFFFFF", "border_color_name": "white", "border_color_hex": "#FFFFFF", "size": "medium", "text": "V", "width": 0.35, "height": 0.35}, {"type": "overlays", "coords": ["r2c3", "r3c3"], "loc": "horizontal edge", "shape": "circle", "color_name": "white", "color_hex": "#FFFFFF", "border_color_name": "white", "border_color_hex": "#FFFFFF", "size": "medium", "text": "V", "width": 0.35, "height": 0.35}, {"type": "overlays", "coords": ["r2c2", "r2c3"], "loc": "vertical edge", "shape": "circle", "color_name": "white", "color_hex": "#FFFFFF", "border_color_name": "white", "border_color_hex": "#FFFFFF", "size": "medium", "text": "X", "width": 0.35, "height": 0.35}, {"type": "overlays", "coords": ["r1c4", "r2c4"], "loc": "horizontal edge", "shape": "circle", "color_name": "white", "color_hex": "#FFFFFF", "border_color_name": "white", "border_color_hex": "#FFFFFF", "size": "medium", "text": "X", "width": 0.35, "height": 0.35}, {"type": "overlays", "coords": ["r3c2", "r4c2"], "loc": "horizontal edge", "shape": "circle", "color_name": "white", "color_hex": "#FFFFFF", "border_color_name": "white", "border_color_hex": "#FFFFFF", "size": "medium", "text": "X", "width": 0.35, "height": 0.35}]
\`\`\`

Submission Requirements

Standard rules apply.
The final output should be just the flattened solution inside <answer> tags.
Maximum input length of 32768 characters.
Maximum output of 32768 tokens.
No tool-calling.