thinking field that separates their reasoning trace from the final answer.
Use this capability to audit model steps, animate the model thinking in a UI, or hide the trace entirely when you only need the final response.
Supported models
- Qwen 3
- GPT-OSS (use
thinklevels:low,medium,high— the trace cannot be fully disabled) - DeepSeek-v3.1
- DeepSeek R1
- Browse the latest additions under thinking models
Enable thinking in API calls
Set thethink field on chat or generate requests. Most models accept booleans (true/false).
GPT-OSS instead expects one of low, medium, or high to tune the trace length.
The message.thinking (chat endpoint) or thinking (generate endpoint) field contains the reasoning trace while message.content / response holds the final answer.
- cURL
- Python
- JavaScript
GPT-OSS requires
think to be set to "low", "medium", or "high". Passing true/false is ignored for that model.Stream the reasoning trace
Thinking streams interleave reasoning tokens before answer tokens. Detect the firstthinking chunk to render a “thinking” section, then switch to the final reply once message.content arrives.
- Python
- JavaScript
CLI quick reference
- Enable thinking for a single run:
ollama run deepseek-r1 --think "Where should I visit in Lisbon?" - Disable thinking:
ollama run deepseek-r1 --think=false "Summarize this article" - Hide the trace while still using a thinking model:
ollama run deepseek-r1 --hidethinking "Is 9.9 bigger or 9.11?" - Inside interactive sessions, toggle with
/set thinkor/set nothink. - GPT-OSS only accepts levels:
ollama run gpt-oss --think=low "Draft a headline"(replacelowwithmediumorhighas needed).
Thinking is enabled by default in the CLI and API for supported models.

