Using Zod and zodResponseFormat with OpenAI
Over the last couple of weeks, I’ve been messing with using Zod and zodResponseFormat
with making OpenAI calls, requesting “structured outputs”.
It is documented in a rudimentary way here but I still had a bunch of open questions.
Here’s a brief but comprehensive summary of the Zod and zodResponseFormat
implementation for OpenAI’s Structured Outputs feature.
All of this is in a node.js environment, coded in ES module format.
Note: As of release on Sep 12 2024, json response formats (whether the old kind or using structured outputs) aren’t supported by the latest o1
or o1-mini
models. (Notwithstanding they aren’t available on the chat completions API and also are only available to tier 5 customers.)
Declarations
Below are the declarations you need.
import OpenAI from "openai";
import { zodResponseFormat } from "openai/helpers/zod";
import { z } from "zod";
const openai = new OpenAI();
This means that you always have to install both openai and Zod (npm install zod openai
) for any project if you plan on using Zod.
An alternative approach is to define your schema in accordance with zodResponseFormat
once, archive that schema, and use it. However, the risk of this approach is that OpenAI will update their helper as well as the range of things available using Zod.
I tried to create my own JSON schema and use those, adhering to the rules OpenAI gave. But I never got it to work for anything but the most basic schema.
Creating Zod Schema
Below are the rules for creating Zod schema for OpenAI calls.
- Use
z.object()
to define the structure - Use
z.string()
,z.number()
,z.boolean()
,z.array()
, andz.enum()
for types - Use
z.array()
thus, specifying the type inside:z.array(z.string())
- Use
z.enum()
thus:z.enum(["thing 1", "thing 2", "thing 3"])
orz.enum([1, 2, 3])
- Use
.describe()
at the end to describe the kind of values that can go in there.
All fields must be set as “required” by default — and the OpenAI helper sets this, so there’s nothing for you to do.
To make an optional value, use z.union()
. E.g.
...
optionalField: z.union(z.string(), z.integer()),
...
You can also nest objects without declaring them explicitly above. But I found that this was problematic. I found it much more reliable to create objects and nest them by name, as shown in the sample code below.
When explicitly writing schema, you can use anyOf
. But at present, there’s no way to use anyOf
(that I know of… if you know, help!) using Zod.
Warning — Don’t use an LLM to create Zod schema (without editing the results). It will get very creative with things like .optional()
, z.any()
, z.record()
(which really should be supported!) or other unsupported bits of schema. It’s likely also to be very complicated. If you do use an LLM, then copy-paste this guide into it as a strict set of guidelines. (And it’ll still probably be cheeky and break the rules…)
Calling OpenAI
There are a couple of small differences (subject to change) in how you call the OpenAI API endpoint. This means that if you have a function for calling LLMs (as I do) then you will have to modify it slightly to be able to use zodResponseFormat
.
- You can use two models —
gpt-4o-mini
(which currently can handle Structured Outputs) orgpt-4o-2024-08-06
. At some point, GPT-4o will redirect to this. - You have to call
openai.beta.chat.completions.parse
- You have to get
message.parsed
The response is always a JSON object.
Sample Code
Below is some sample code for using Zod and zodResponseFormat
. Make sure you have the declarations above, and of course, you have to initialise the OpenAI API.
const thingSchema = z.object({
type: z.enum(["alien", "spaceship"]),
name: z.string().describe('The name of the thing');
});
const MySchema = z.object({
field1: z.string(),
field2: z.number(),
nestedField: z.object({
subField: z.boolean()
}),
enumField: z.enum(["sausage", "elephant", "telephone"]),
arrayField: z.array(z.string()),
arrayOfThings: z.array(thingSchema)
});
const completion = await openai.beta.chat.completions.parse({
model: "gpt-4o-mini",
messages: [
{ role: "system", content: "Instruction for the model" },
{ role: "user", content: "User input" }
],
response_format: zodResponseFormat(MySchema, "schemaName")
});
const result = completion.choices[0].message.parsed;
Various Implementation Rules
Below are a mix of rules from OpenAI plus some things I learned through experimentation.
- There is a maximum of 5 levels of nesting (which is easy to get to!).
- Objects can have up to 100 properties (though I haven’t gotten near this)
- Use
zodResponseFormat(Schema, "schemaName")
in the API call- You have to give the schema a name — but I don’t know why. It can be anything.
- When calling other functions (e.g. if you have a function that makes your LLM calls), you should pass the object created by
zodResponseFormat
— not the Zod schema.
Some of these rules may change as OpenAI updates its API.
Final thoughts
OpenAI’s zodResponseFormat
schema seems to be a work in progress. It’s not easy to use, requires too many imports, and is quite limited compared to how much Zod schema can do.
The main limitations, in my mind, are
- The need to use “zod”. It’s yet another package and standard, even if it’s a good package. Unfortunately, creating json schema is annoying, so it’s necessary. Google’s release of “JSON mode” with Gemini is easier to parse and use.
- The limited levels of nesting. Five levels?? C’mon.
There are little bits that make me think… Why is this released? For example, why make it slightly complicated to use optional values? Why is there no clear description of how to use anyOf
using Zod?
Anyway, it’s a good start, and within its constraints, it works very reliably. But I’d be aware that it will likely change again very soon.
Excellent post, so far this is the best way I’ve seen when it comes to handling JSON responses when prompting OpenAI.
Definitely. It’s fiddly but worth it.
I would like to see the json schema you used but failed. They do say that you can use json schema in the docs but recommend either zod or pydantic.
Would also appreciate seeing how you tried to use anyOf.
Also I don’t quite understand what .describe() is for. Looking at the zod docs I found “Use .describe() to add a description property to the resulting schema.” Would this be kind of like comments inside the schema to help the developer understand the schema or am I misunderstanding?
About “zodResponseFormat(Schema, “schemaName”)” this schemaName doesn’t seem to be used anywhere else, it’s only needed for calling zodResponseFormat() is that correct?
I decided to open up the node module where where zod is and ask chatgpt about some of the code there. Here is a link https://chatgpt.com/share/672cbeae-375c-8013-9442-b4ed6e5570be
It is in fact converting the zod schema into a json schema so we should be able to write directly in json schema format.I think the key to why they recommend zod is that it has validator. With json schema youll have to figure that part out yourself and they have a looong list of validators.
Now how accurate are the chatgpt answers I can’t say. Apparently name is there for tracking errors. And zodResponseFormat has an additional argument which is “props” which is not mentioned in the openai docs for some reason. According to chatgpt it can be used for adding meta data/ descriptions or whatever.
As far as why they released it in it’s current limited state I think for people to try it out and I believe there are some developers such as myself who could really benefit from it as it is currently. Though I’m still not a professional, it’s just for personal projects.
Hey there, I didn’t keep a record of the broken schema I used — I just realised an easier path forward was to use zod. Because openai updates their npm package constantly, I thought the schema creation code might also be regularly updated (and it has definitely updated in the time I’ve been using it), so I do think it’s the best path forward for now.