Skip to content

ollama/ollama-python

Repository files navigation

Ollama Python Library

The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with Ollama.

Prerequisites

  • Ollama should be installed and running
  • Pull a model to use with the library: ollama pull <model> e.g. ollama pull gemma3
    • See Ollama.com for more information on the models available.

Install

pip install ollama

Usage

fromollamaimportchatfromollamaimportChatResponseresponse: ChatResponse=chat(model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?', }, ]) print(response['message']['content']) # or access fields directly from the response objectprint(response.message.content)

See _types.py for more information on the response types.

Streaming responses

Response streaming can be enabled by setting stream=True.

fromollamaimportchatstream=chat( model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}], stream=True, ) forchunkinstream: print(chunk['message']['content'], end='', flush=True)

Cloud Models

Run larger models by offloading to Ollama’s cloud while keeping your local workflow.

  • Supported models: deepseek-v3.1:671b-cloud, gpt-oss:20b-cloud, gpt-oss:120b-cloud, kimi-k2:1t-cloud, qwen3-coder:480b-cloud, kimi-k2-thinking See Ollama Models - Cloud for more information

Run via local Ollama

  1. Sign in (one-time):
ollama signin 
  1. Pull a cloud model:
ollama pull gpt-oss:120b-cloud 
  1. Make a request:
fromollamaimportClientclient=Client() messages= [{'role': 'user', 'content': 'Why is the sky blue?', }, ] forpartinclient.chat('gpt-oss:120b-cloud', messages=messages, stream=True): print(part.message.content, end='', flush=True)

Cloud API (ollama.com)

Access cloud models directly by pointing the client at https://ollama.com.

  1. Create an API key from ollama.com , then set:
export OLLAMA_API_KEY=your_api_key 
  1. (Optional) List models available via the API:
curl https://ollama.com/api/tags 
  1. Generate a response via the cloud API:
importosfromollamaimportClientclient=Client( host='https://ollama.com', headers={'Authorization': 'Bearer '+os.environ.get('OLLAMA_API_KEY')} ) messages= [{'role': 'user', 'content': 'Why is the sky blue?', }, ] forpartinclient.chat('gpt-oss:120b', messages=messages, stream=True): print(part.message.content, end='', flush=True)

Custom client

A custom client can be created by instantiating Client or AsyncClient from ollama.

All extra keyword arguments are passed into the httpx.Client.

fromollamaimportClientclient=Client( host='http://localhost:11434', headers={'x-some-header': 'some-value'} ) response=client.chat(model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?', }, ])

Async client

The AsyncClient class is used to make asynchronous requests. It can be configured with the same fields as the Client class.

importasynciofromollamaimportAsyncClientasyncdefchat(): message={'role': 'user', 'content': 'Why is the sky blue?'} response=awaitAsyncClient().chat(model='gemma3', messages=[message]) asyncio.run(chat())

Setting stream=True modifies functions to return a Python asynchronous generator:

importasynciofromollamaimportAsyncClientasyncdefchat(): message={'role': 'user', 'content': 'Why is the sky blue?'} asyncforpartinawaitAsyncClient().chat(model='gemma3', messages=[message], stream=True): print(part['message']['content'], end='', flush=True) asyncio.run(chat())

API

The Ollama Python library's API is designed around the Ollama REST API

Chat

ollama.chat(model='gemma3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])

Generate

ollama.generate(model='gemma3', prompt='Why is the sky blue?')

List

ollama.list()

Show

ollama.show('gemma3')

Create

ollama.create(model='example', from_='gemma3', system="You are Mario from Super Mario Bros.")

Copy

ollama.copy('gemma3', 'user/gemma3')

Delete

ollama.delete('gemma3')

Pull

ollama.pull('gemma3')

Push

ollama.push('user/gemma3')

Embed

ollama.embed(model='gemma3', input='The sky is blue because of rayleigh scattering')

Embed (batch)

ollama.embed(model='gemma3', input=['The sky is blue because of rayleigh scattering', 'Grass is green because of chlorophyll'])

Ps

ollama.ps()

Errors

Errors are raised if requests return an error status or if an error is detected while streaming.

model='does-not-yet-exist'try: ollama.chat(model) exceptollama.ResponseErrorase: print('Error:', e.error) ife.status_code==404: ollama.pull(model)