Installation and Setup
pip install openai
# Set API key (prefer environment variable over hardcoding)
export OPENAI_API_KEY="sk-..."
from openai import OpenAI
# Client reads OPENAI_API_KEY from environment automatically
client = OpenAI()
# Or pass explicitly (not recommended for production)
client = OpenAI(api_key="sk-...")
Basic Chat Completions
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain Python decorators in 2 sentences."}
],
temperature=0.7, # 0 = deterministic, 2 = very random
max_tokens=200,
)
print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")
Multi-turn Conversations
conversation = [
{"role": "system", "content": "You are a Python tutor."}
]
def chat(user_message: str) -> str:
conversation.append({"role": "user", "content": user_message})
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=conversation,
)
assistant_reply = response.choices[0].message.content
conversation.append({"role": "assistant", "content": assistant_reply})
return assistant_reply
print(chat("What is a list comprehension?"))
print(chat("Can you show me an example with filtering?"))
# Context is preserved across turns
Streaming Responses
Streaming shows tokens as they're generated, improving perceived latency significantly for long responses:
def stream_chat(prompt: str):
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
print() # newline after stream ends
stream_chat("Write a haiku about Python programming.")
# For async streaming (FastAPI, etc.)
async def stream_chat_async(prompt: str):
async with client.beta.stream(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
) as stream:
async for text in stream.text_stream:
yield text
Function Calling (Tool Use)
Function calling lets the model decide when to invoke your Python functions, parse structured arguments, and incorporate results:
import json
# Define your functions as JSON Schema
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g. 'Berlin'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"]
}
}
}
]
# Your actual function
def get_weather(city: str, unit: str = "celsius") -> dict:
# In reality, call a weather API
return {"city": city, "temp": 22, "unit": unit, "condition": "sunny"}
def run_with_tools(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
# First call -- model may request a tool
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto",
)
message = response.choices[0].message
# If model called a tool, execute it and continue
if message.tool_calls:
messages.append(message)
for tool_call in message.tool_calls:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)
# Call the actual function
if func_name == "get_weather":
result = get_weather(**func_args)
else:
result = {"error": f"Unknown function: {func_name}"}
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result),
})
# Second call -- model uses tool result to form final response
final = client.chat.completions.create(
model="gpt-4o",
messages=messages,
)
return final.choices[0].message.content
return message.content
print(run_with_tools("What's the weather in Tokyo right now?"))
Structured Output with Pydantic
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class ArticleSummary(BaseModel):
title: str
key_points: list[str]
sentiment: str # positive, neutral, negative
word_count_estimate: int
response = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06", # structured output requires this model+
messages=[
{"role": "system", "content": "Extract structured info from articles."},
{"role": "user", "content": "Python 3.13 was released with a free-threaded mode that removes the GIL under a special build flag, enabling true multi-core parallelism for CPU-bound Python code."}
],
response_format=ArticleSummary,
)
summary = response.choices[0].message.parsed
print(summary.title) # Python 3.13 Introduces Free-Threaded Mode
print(summary.key_points) # ['Removes GIL...', ...]
print(summary.sentiment) # positive
Embeddings
def get_embedding(text: str, model: str = "text-embedding-3-small") -> list[float]:
text = text.replace("\n", " ")
response = client.embeddings.create(input=[text], model=model)
return response.data[0].embedding
# Batch embeddings (more efficient)
def get_embeddings(texts: list[str]) -> list[list[float]]:
response = client.embeddings.create(
input=texts,
model="text-embedding-3-small"
)
return [item.embedding for item in response.data]
# Cosine similarity search
import numpy as np
def cosine_similarity(a: list[float], b: list[float]) -> float:
a, b = np.array(a), np.array(b)
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
# Build a simple semantic search
documents = ["Python is a programming language", "Dogs are mammals", "FastAPI is fast"]
doc_embeddings = get_embeddings(documents)
query = "What coding language should I learn?"
query_embedding = get_embedding(query)
similarities = [(doc, cosine_similarity(query_embedding, emb))
for doc, emb in zip(documents, doc_embeddings)]
best = max(similarities, key=lambda x: x[1])
print(f"Most relevant: {best[0]} (score: {best[1]:.3f})")
Vision (Image Analysis)
import base64
from pathlib import Path
def analyze_image_file(image_path: str, question: str) -> str:
"""Analyze a local image file."""
image_data = base64.b64encode(Path(image_path).read_bytes()).decode("utf-8")
ext = Path(image_path).suffix.lstrip(".")
mime = f"image/{ext}" if ext != "jpg" else "image/jpeg"
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": question},
{
"type": "image_url",
"image_url": {
"url": f"data:{mime};base64,{image_data}",
"detail": "high" # "low", "high", or "auto"
}
}
]
}]
)
return response.choices[0].message.content
result = analyze_image_file("screenshot.png", "What UI issues do you see in this design?")
Rate Limits and Retries
import time
from openai import OpenAI, RateLimitError, APIError
client = OpenAI(
max_retries=3, # automatic retry on 429/500 (built into SDK)
timeout=30.0, # request timeout in seconds
)
# Manual exponential backoff for custom logic
def robust_completion(messages, max_attempts=5):
for attempt in range(max_attempts):
try:
return client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
)
except RateLimitError as e:
if attempt == max_attempts - 1:
raise
wait = (2 ** attempt) + (0.1 * attempt) # exponential + jitter
print(f"Rate limited. Waiting {wait:.1f}s...")
time.sleep(wait)
except APIError as e:
if e.status_code in (500, 502, 503) and attempt < max_attempts - 1:
time.sleep(2 ** attempt)
else:
raise
Cost Optimization Tips
- Use gpt-4o-mini for simple tasks -- 15x cheaper than gpt-4o, suitable for classification, extraction, summarization
- Limit max_tokens -- set it to what you actually need; default is unbounded
- Cache repeated prompts -- OpenAI's prompt caching gives 50% discount on repeated prompt prefixes >1024 tokens
- Batch API -- 50% discount for non-real-time workloads via
client.batches.create() - Embeddings: text-embedding-3-small -- 5x cheaper than ada-002, better quality
aiforeverthing.com — Free developer tools, no signup
Frequently Asked Questions
Which model should I use: gpt-4o or gpt-4o-mini?
Start with gpt-4o-mini for most tasks (classification, summarization, extraction, simple Q&A). Upgrade to gpt-4o only when you need stronger reasoning, complex instructions, or multilingual accuracy. The cost difference is 15x.
How do I handle context window limits?
GPT-4o has a 128k token context window. For conversations exceeding this, implement a sliding window (keep last N messages) or summarize old context into a system message. Use tiktoken to count tokens before sending.
Is the OpenAI Python SDK thread-safe?
Yes. The OpenAI client is safe to share across threads. For async code, use AsyncOpenAI instead of OpenAI.
How do I test without spending API credits?
Use OpenAI's evals framework for automated testing. For unit tests, mock the client with unittest.mock or use the unofficial openai-mock library. For integration tests, use a cheaper model like gpt-4o-mini.
DigitalOcean is the go-to cloud for developers deploying containerized apps — clean UI, predictable pricing, and managed Kubernetes. New accounts get $200 free credit (60-day trial) to test this setup at zero cost.
Get $200 Free Credit → Affiliate link — we may earn a commission at no extra cost to you.