Improving Prompt Consistency with Structured Generations

Will Kurt, Remi Louf, and Clémentine Fourrier on Improving Prompt Consistency with Structured Generations:

It’s worth mentioning that the regex controlling the structure is similar, but not identical to, the regex used to parse out the answer. We’ve learned there’s an interesting bit of nuance in defining the structure since, like the prompt, it can impact performance. For example, notice that {200,700} in the regex. This means that the model has 200 to 700 characters to “reason” before answering. Changing these values can impact performance and leads to something we refer to as “thought control”, an area we’re hoping to write more about soon.

A nice roundup about how AI models are evaluated and it’s performance tested, with some surprising results when using structured output: Turns out, structured output make most models perform consistently better.