Body
Model name
Chat history as an array of message objects (each with a role and content)
Optional list of function tools the model may call during the chat
Format to return a response in. Can be json or a JSON schema
json Runtime options that control text generation
When true, returns separate thinking output in addition to content. Can be a boolean (true/false) or a string ("high", "medium", "low") for supported models.
Model keep-alive duration (for example 5m or 0 to unload immediately)
Whether to return log probabilities of the output tokens
Number of most likely tokens to return at each token position when logprobs are enabled
Response
Chat response
Model name used to generate this message
Timestamp of response creation (ISO 8601)
Indicates whether the chat response has finished
Reason the response finished
Total time spent generating in nanoseconds
Time spent loading the model in nanoseconds
Number of tokens in the prompt
Time spent evaluating the prompt in nanoseconds
Number of tokens generated in the response
Time spent generating tokens in nanoseconds
Log probability information for the generated tokens when logprobs are enabled

