Intermediate45 minModule 6 of 6

Working with APIs

REST APIs, authentication, OpenAI API, Anthropic API, HuggingFace API.

APIs are how software systems talk to each other. When you use an AI-powered app, it's almost certainly calling an API behind the scenes — sending your prompt to a model hosted in the cloud and receiving the response back. Understanding APIs unlocks the ability to build your own AI-powered applications, automate workflows, and integrate AI capabilities into any system you control.

What Is an API?

An API (Application Programming Interface) is a contract between two pieces of software. It defines how to request a service and what you'll get back. Think of it like a restaurant menu — you don't need to know how the kitchen works. You just need to know what you can order (the API endpoints), how to place your order (the request format), and what you'll receive (the response format).

When you type a message to ChatGPT on the website, the browser is making an API call to OpenAI's servers. The API receives your message, runs it through the model, and sends the generated text back. Every AI product you use — from Midjourney to GitHub Copilot to voice assistants — works this way under the hood.

REST APIs and HTTP Methods

Most modern APIs follow the REST (Representational State Transfer) pattern, which uses standard HTTP methods to perform operations. These are the same protocols your web browser uses — APIs just speak the same language programmatically.

HTTP MethodPurposeExample
GETRetrieve data (read-only, no side effects)Get a list of your models, check API status
POSTSend data to create or trigger somethingSend a prompt to generate a completion, upload a file
PUT / PATCHUpdate an existing resourceUpdate a fine-tuning job's metadata
DELETERemove a resourceDelete a fine-tuned model, remove a file

For AI APIs, you will use POST most of the time — sending a prompt and receiving generated text, images, or embeddings in return.

Authentication: API Keys and OAuth

APIs need to know who is making the request — both for billing and for security. The two most common authentication methods are API keys and OAuth.

API Keys

An API key is a long random string that identifies your account. You include it in every request, typically as an HTTP header. Most AI APIs (OpenAI, Anthropic, Hugging Face) use API keys because they are simple and developer-friendly.

Using API keys in Python:

import os

# NEVER hardcode API keys in your source code
# Store them in environment variables instead
api_key = os.environ.get("OPENAI_API_KEY")

# The key is sent as an Authorization header
headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json",
}

# Or set it via a .env file with python-dotenv
from dotenv import load_dotenv
load_dotenv()  # Loads variables from .env file
api_key = os.environ.get("ANTHROPIC_API_KEY")
API Key Security
Treat API keys like passwords. Never commit them to Git, never put them in client-side code (JavaScript in the browser), and never share them publicly. If a key is leaked, anyone can use your account and run up charges. Use environment variables or a secrets manager, and rotate keys if you suspect a leak.

OAuth 2.0

OAuth is a more complex authentication system used when an application needs to access a user's data on their behalf — for example, a third-party app accessing your Google Drive. OAuth involves tokens, scopes (permissions), and redirect flows. You won't typically encounter OAuth with AI APIs directly, but it's important when integrating AI into applications that connect to services like Google, Microsoft, or Slack.

Rate Limiting

Every API limits how many requests you can make in a given time window. This prevents abuse and ensures fair access for all users. Rate limits are typically expressed as requests per minute (RPM) and tokens per minute (TPM) for AI APIs.

Handling rate limits gracefully:

import time
import requests

def call_api_with_retry(url, headers, data, max_retries=3):
    """Make an API call with exponential backoff on rate limits."""
    for attempt in range(max_retries):
        response = requests.post(url, headers=headers, json=data)

        if response.status_code == 200:
            return response.json()
        elif response.status_code == 429:  # Rate limited
            wait_time = 2 ** attempt  # 1s, 2s, 4s...
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)
        else:
            response.raise_for_status()

    raise Exception("Max retries exceeded")

JSON: The Language of APIs

Almost all modern APIs communicate using JSON (JavaScript Object Notation). JSON is a lightweight, human-readable data format that uses key-value pairs, arrays, and nested objects — structures that map directly to Python dictionaries and lists.

JSON request and response example:

# Sending a JSON request body (Python dict → JSON automatically)
import requests

data = {
    "model": "gpt-4o",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain APIs in one sentence."},
    ],
    "temperature": 0.7,
    "max_tokens": 100,
}

response = requests.post(url, headers=headers, json=data)

# Parsing the JSON response (JSON → Python dict automatically)
result = response.json()

# Navigate the nested structure
message = result["choices"][0]["message"]["content"]
tokens_used = result["usage"]["total_tokens"]
print(f"Response: {message}")
print(f"Tokens used: {tokens_used}")

Making API Calls with Python

The requests library is the standard tool for making HTTP requests in Python. It handles headers, JSON encoding/decoding, error handling, and more with a clean, simple API.

Making API calls with the requests library:

import requests
import os

# --- GET request: retrieve information ---
response = requests.get(
    "https://api.openai.com/v1/models",
    headers={"Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}"},
)
print(f"Status: {response.status_code}")  # 200 = success
models = response.json()["data"]
print(f"Available models: {len(models)}")

# --- POST request: send data and get a response ---
response = requests.post(
    "https://api.openai.com/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}",
        "Content-Type": "application/json",
    },
    json={
        "model": "gpt-4o",
        "messages": [{"role": "user", "content": "Hello!"}],
    },
)

if response.status_code == 200:
    print(response.json()["choices"][0]["message"]["content"])
else:
    print(f"Error {response.status_code}: {response.text}")

OpenAI API

OpenAI provides API access to GPT-4o, GPT-4.5, and the latest GPT-5.4 models for text generation, as well as DALL-E for image generation and Whisper for speech-to-text. The chat completions endpoint is the core API for text generation.

Chat Completions

OpenAI chat completions with the official SDK:

from openai import OpenAI

client = OpenAI()  # Reads OPENAI_API_KEY from environment

# Basic chat completion
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful coding tutor."},
        {"role": "user", "content": "Explain list comprehensions in Python."},
    ],
    temperature=0.7,
    max_tokens=500,
)

print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

Function Calling (Tool Use)

Function calling allows GPT models to output structured JSON that maps to functions you define. Instead of just generating text, the model can decide to "call" one of your functions with specific arguments — enabling AI to interact with databases, APIs, calculators, and any other tools you provide.

OpenAI function calling:

from openai import OpenAI
import json

client = OpenAI()

# Define tools the model can use
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and state, e.g. Austin, TX",
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                    },
                },
                "required": ["location"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Austin?"}],
    tools=tools,
    tool_choice="auto",
)

# Check if the model wants to call a function
message = response.choices[0].message
if message.tool_calls:
    call = message.tool_calls[0]
    args = json.loads(call.function.arguments)
    print(f"Model wants to call: {call.function.name}")
    print(f"With arguments: {args}")
    # You would then execute get_weather(args) and send the
    # result back to the model for a final response

Anthropic API

Anthropic's API provides access to the Claude model family. The Messages API is the primary endpoint for generating text with Claude. Claude is known for strong reasoning, safety-oriented behavior, long context windows (up to 1 million tokens on Claude Opus 4.6), and excellent instruction following.

Messages API

Anthropic Messages API with the official SDK:

import anthropic

client = anthropic.Anthropic()  # Reads ANTHROPIC_API_KEY from env

# Basic message
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful coding tutor.",
    messages=[
        {"role": "user", "content": "Explain decorators in Python."},
    ],
)

print(message.content[0].text)
print(f"Input tokens: {message.usage.input_tokens}")
print(f"Output tokens: {message.usage.output_tokens}")

Tool Use with Claude

Claude supports tool use (the equivalent of OpenAI's function calling) through the same Messages API. You define tools with JSON Schema parameters, and Claude can decide when and how to use them.

Claude tool use:

import anthropic

client = anthropic.Anthropic()

# Define available tools
tools = [
    {
        "name": "get_stock_price",
        "description": "Get the current stock price for a ticker symbol",
        "input_schema": {
            "type": "object",
            "properties": {
                "ticker": {
                    "type": "string",
                    "description": "Stock ticker symbol, e.g. AAPL",
                },
            },
            "required": ["ticker"],
        },
    }
]

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "What is Apple's stock price?"},
    ],
)

# Claude may respond with a tool_use content block
for block in response.content:
    if block.type == "tool_use":
        print(f"Tool: {block.name}")
        print(f"Input: {block.input}")  # {"ticker": "AAPL"}
        # Execute the tool and send results back to Claude
    elif block.type == "text":
        print(f"Text: {block.text}")
OpenAI vs. Anthropic API Differences
The APIs are similar in structure but differ in details. OpenAI uses "functions" / "tools" and returns tool calls in the message object. Anthropic uses "tools" with "input_schema" and returns tool_use content blocks. Both support streaming, system prompts, and multi-turn conversations. The official Python SDKs handle the low-level HTTP details for you.

Hugging Face API

Hugging Face hosts thousands of open-source models and provides an Inference API that lets you run them without managing infrastructure. This is particularly useful for specialized models — sentiment analysis, translation, summarization, image generation — that may not be available through OpenAI or Anthropic.

Hugging Face Inference API:

from huggingface_hub import InferenceClient

client = InferenceClient(token=os.environ["HF_TOKEN"])

# Text generation with an open model
response = client.text_generation(
    "The future of AI is",
    model="meta-llama/Llama-3.3-70B-Instruct",
    max_new_tokens=200,
)
print(response)

# Sentiment analysis with a specialized model
result = client.text_classification(
    "I love this product! It works perfectly.",
    model="distilbert/distilbert-base-uncased-finetuned-sst-2-english",
)
print(result)  # [{"label": "POSITIVE", "score": 0.9998}]

# Image generation
image = client.text_to_image(
    "A serene mountain landscape at sunset, oil painting style",
    model="stabilityai/stable-diffusion-xl-base-1.0",
)
image.save("landscape.png")

Building an AI-Powered Application

Let's put everything together and build a simple but functional AI-powered application: a command-line research assistant that takes a topic, generates a research summary, and extracts key points into structured JSON.

A complete AI-powered research assistant:

import anthropic
import json

client = anthropic.Anthropic()

def research_topic(topic: str) -> dict:
    """Generate a structured research summary on any topic."""

    # Step 1: Generate a detailed research summary
    summary_response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        system=(
            "You are a research assistant. Provide accurate, "
            "well-organized summaries with specific facts and "
            "data points when available."
        ),
        messages=[
            {
                "role": "user",
                "content": f"Write a comprehensive research summary "
                           f"about: {topic}",
            },
        ],
    )
    summary = summary_response.content[0].text

    # Step 2: Extract structured data from the summary
    extraction_response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": f"""Extract key information from this research
summary and return it as JSON with these keys:
- "title": a concise title
- "key_points": array of 3-5 main findings
- "applications": array of practical applications
- "challenges": array of current challenges or limitations
- "confidence": "high", "medium", or "low" based on how
  well-established the information is

Summary:
{summary}

Return only valid JSON, no other text.""",
            },
        ],
    )

    structured = json.loads(extraction_response.content[0].text)
    structured["full_summary"] = summary

    return structured


# Usage
if __name__ == "__main__":
    result = research_topic("quantum computing applications in 2026")

    print(f"Title: {result['title']}")
    print(f"Confidence: {result['confidence']}")
    print("\nKey Points:")
    for point in result["key_points"]:
        print(f"  - {point}")
    print("\nApplications:")
    for app in result["applications"]:
        print(f"  - {app}")
    print("\nChallenges:")
    for challenge in result["challenges"]:
        print(f"  - {challenge}")
Prompt Chaining in Code
Notice how the application makes two API calls — first to generate content, then to extract structure from it. This is prompt chaining implemented programmatically. Each call has a focused task, producing better results than trying to do everything in a single prompt. This pattern is the foundation of most production AI applications.

Error Handling and Best Practices

Production applications need to handle failures gracefully. APIs can fail for many reasons — network issues, rate limits, invalid inputs, server errors, or model outages.

Robust API error handling:

import anthropic
import time

client = anthropic.Anthropic()

def safe_api_call(
    messages,
    model = "claude-sonnet-4-20250514",
    max_retries = 3,
):
    """Make a resilient API call with retry logic."""
    for attempt in range(max_retries):
        try:
            response = client.messages.create(
                model=model,
                max_tokens=1024,
                messages=messages,
            )
            return response.content[0].text

        except anthropic.RateLimitError:
            wait = 2 ** attempt
            print(f"Rate limited. Retrying in {wait}s...")
            time.sleep(wait)

        except anthropic.APIConnectionError:
            wait = 2 ** attempt
            print(f"Connection error. Retrying in {wait}s...")
            time.sleep(wait)

        except anthropic.BadRequestError as e:
            # Don't retry client errors — fix the request
            print(f"Bad request: {e}")
            raise

        except anthropic.APIStatusError as e:
            if e.status_code >= 500:
                wait = 2 ** attempt
                print(f"Server error. Retrying in {wait}s...")
                time.sleep(wait)
            else:
                raise

    raise Exception(f"Failed after {max_retries} retries")

API Best Practices

Security

  • Store API keys in environment variables, never in code
  • Use .env files locally and secrets managers in production
  • Add .env to your .gitignore file
  • Rotate keys periodically and immediately if compromised

Reliability

  • Always implement retry logic with exponential backoff
  • Set reasonable timeouts on all API requests
  • Log API errors and response times for monitoring
  • Validate API responses before using them in your application

Cost Management

  • Set spending limits in your API provider's dashboard
  • Use cheaper models for simple tasks, reserve powerful models for complex ones
  • Cache responses when the same input might recur
  • Monitor token usage and set max_tokens to avoid runaway costs

Performance

  • Use streaming for long responses to improve perceived latency
  • Batch requests when processing multiple items
  • Use async/await (asyncio) for concurrent API calls in Python
  • Keep prompts concise — fewer input tokens means faster and cheaper responses
Costs Can Add Up Quickly
AI API pricing is based on tokens (roughly 4 characters per token in English). A single GPT-4o call might cost fractions of a cent, but an application processing thousands of requests per day can accumulate significant costs. Always set up billing alerts and spending limits before deploying any API-based application to production.

Resources

Key Takeaways

  • 1An API is a contract between software systems — you send a structured request and receive a structured response. REST APIs using HTTP methods (GET, POST) are the standard.
  • 2API keys are the most common authentication method for AI APIs. Never hardcode them — use environment variables and add .env to your .gitignore.
  • 3Rate limiting protects APIs from overuse. Always implement retry logic with exponential backoff to handle 429 (rate limit) responses gracefully.
  • 4The OpenAI and Anthropic APIs follow similar patterns: send messages with a model name, system prompt, and conversation history; receive generated text. Both support tool use for structured interactions.
  • 5Hugging Face provides API access to thousands of specialized open-source models for tasks like sentiment analysis, translation, and image generation.
  • 6Production AI applications require robust error handling, cost monitoring, response validation, and security practices around API key management.
  • 7Prompt chaining in code — making sequential API calls where each builds on the previous result — is the fundamental pattern behind most AI-powered applications.

Test Your Understanding

Module Assessment

5 questions · Score 70% or higher to complete this module

You can retake the quiz as many times as you need. Your best score is saved.

Cookie Preferences

We use cookies to enhance your experience. By continuing, you agree to our use of cookies.