Description
Given 2 random integers between 1 and 13 digits long, write a prompt to multiply them correctly. The LLM should execute it directly without using any tools or code.
Evaluation
Each submission will be tested against 100 randomly generated test cases, where the integers increase in digits by 2 every group of 20:
For each of the 100 test cases, the grader will prompt the selected LLM with the submission's prompt concatenated with 2 new line characters and the test case:
Then, each response will be scored for exact match (i.e. the multiplied value should match the expected answer exactly). The final score is the percentage of correct test cases:
Example
Submission Requirements
- Standard rules apply.
- The final output should be just the result with no commas or units inside
<answer>
tags. - Maximum input length of 1024 characters.
- Maximum output of 4096 tokens.
- No tool-calling.