LLM API Fundamentals — Chatbot Engineering with GPT APIs, Prompting & Simple UIs
Objective
The objective of this project is to develop foundational skills in working with Large Language Model APIs by building a chatbot application. Learners will understand API authentication, basic prompt construction, request/response handling, and create a functional command-line chat interface that demonstrates core conversational AI concepts.
Learning Outcomes
- Understand LLM API Fundamentals: Learn how LLM APIs work, including authentication, endpoints, request formatting, and response parsing. Understand tokens, models, and basic parameters.
- Master Basic Prompt Engineering: Construct effective prompts that elicit desired responses. Learn the importance of clear instructions, context setting, and basic prompt patterns.
- Implement Simple Conversation Flow: Build a chatbot that maintains context across multiple turns. Understand the role of conversation history in generating coherent responses.
- Handle API Responses Safely: Parse API responses, handle basic errors, and implement simple retry logic for common failure scenarios.
- Create User Interfaces: Build both command-line and simple web interfaces for interacting with LLMs, understanding the basics of user input handling and response display.
By achieving these learning outcomes, participants will have a solid foundation for building more complex LLM-powered applications.
Prerequisites
- Basic Python programming skills
- Understanding of APIs and HTTP requests
- Familiarity with JSON data format
- Basic command-line usage
- Environment variable management
Skills & Tools
Skills You’ll Develop
- API Integration: Making HTTP requests, handling responses
- Prompt Design: Writing clear, effective prompts
- Error Handling: Basic exception handling and retries
- State Management: Maintaining conversation context
- User Interface Design: CLI and simple web interfaces
- Environment Configuration: API key management
Tools You’ll Master
- OpenAI Python Library: Official API client
- python-dotenv: Environment variable management
- Requests: HTTP library (understanding underlying API calls)
- Streamlit: Simple web UI framework
- JSON: Data parsing and formatting
Steps and Tasks
Part 1: Understanding LLM APIs
1. API Basics and Setup
Learn how LLM APIs work and set up your development environment.
- API concepts: Understanding endpoints, authentication, and rate limits
- What are tokens and why they matter
- Different models and their capabilities
- Cost considerations and token limits (link learners to live pricing rather than hard-coding)
- Environment setup: Secure API key management
- Never hardcode API keys
- Using environment variables safely
Basic API setup and first call (Responses API)
import os
from dotenv import load_dotenv
from openai import OpenAI
# Load environment variables
load_dotenv()
# Initialize OpenAI client (reads OPENAI_API_KEY from env)
client = OpenAI()
def test_api_connection():
"""Test a minimal call using the Responses API."""
try:
resp = client.responses.create(
model="gpt-4o-mini",
input="Say hello in one short sentence."
)
print("API connected successfully!")
print("Response:", resp.output_text.strip())
return True
except Exception as e:
print(f"API connection failed: {e}")
return False
# Understanding a response object (shape may evolve; inspect safely)
def explore_response_object():
resp = client.responses.create(
model="gpt-4o-mini",
input="Hi! Summarize yourself in five words."
)
print("Top-level fields:", [k for k in resp.__dict__.keys() if not k.startswith('_')])
print("Output text:", resp.output_text.strip())
Streaming responses (great for chat UIs)
from openai import OpenAI
import sys
client = OpenAI()
def stream_reply(prompt: str, model: str = "gpt-4o-mini"):
"""
Stream tokens as they arrive. Useful for responsive CLIs/web UIs.
"""
try:
with client.responses.stream(
model=model,
input=prompt
) as stream:
for event in stream:
# Text deltas arrive incrementally
if event.type == "response.output_text.delta":
sys.stdout.write(event.delta)
sys.stdout.flush()
# You can handle other event types as needed (tool calls, errors, etc.)
final = stream.get_final_response()
print("\n\n[Done]")
return final.output_text.strip()
except Exception as e:
print(f"\n[Stream error] {e}")
return ""
2. Understanding Tokens
Learn about tokens—the fundamental unit of LLM processing.
- Token counting: How text translates to tokens
- Model-specific tokenization and limits
- Impact on cost and rate limits
- Managing token limits: Fit prompts within context; plan response space
- Input vs output tokens
Token counting and management (robust fallback)
import tiktoken
def tokenizer_for(model: str):
"""
Robust tokenizer selection: use a safe fallback if a new model isn't mapped yet.
"""
try:
return tiktoken.encoding_for_model(model)
except KeyError:
return tiktoken.get_encoding("o200k_base")
def count_tokens(text: str, model: str = "gpt-4o-mini") -> int:
encoding = tokenizer_for(model)
return len(encoding.encode(text))
def estimate_conversation_tokens(messages: list[dict], model: str = "gpt-4o-mini") -> int:
"""
Rough estimate: count only user/assistant/system contents.
(SDK may add some overhead formatting tokens internally.)
"""
total = 0
for m in messages:
total += count_tokens(m.get("content", ""), model=model)
return total
# Examples
print(f"'Hello' uses {count_tokens('Hello')} tokens (gpt-4o-mini approx)")
prompt_tokens = count_tokens("Write a short story about a cat.")
print("Prompt tokens:", prompt_tokens)
3. Basic Error Handling
Handle common API errors gracefully.
- Common errors: Rate limits, timeouts, invalid requests
- Understanding error messages
- Appropriate retry strategies (exponential backoff, jitter)
- User-friendly messages: Don’t expose stack traces
- Graceful degradation and fallbacks
Error handling with retries
import time
from typing import List, Dict
from openai import OpenAI
client = OpenAI()
def safe_api_call(messages: List[Dict], max_retries: int = 3, temperature: float = 0.7) -> str:
"""
Make an API call with basic retry logic.
Uses Responses API with a messages array (system + user, etc).
"""
for attempt in range(max_retries):
try:
resp = client.responses.create(
model="gpt-4o-mini",
input=messages,
temperature=temperature,
)
return resp.output_text.strip()
except Exception as e:
# In production, branch on known exception types (timeouts, rate limits) and log details.
wait_time = min(2 ** attempt, 8)
print(f"[Warn] API error: {e}. Retrying in {wait_time}s...")
time.sleep(wait_time)
return "I'm temporarily unavailable. Please try again shortly."
Part 2: Basic Prompt Engineering
1. Prompt Structure and Components
Learn how to construct effective prompts.
-
Prompt components:
- Instructions: What you want the AI to do
- Context: Background information
- Examples: Desired format or style
- Constraints: Length, tone, scope
-
Clear communication: Unambiguous phrasing
-
Specific vs vague instructions
-
Impact of prompt clarity on responses
Basic prompt patterns (Responses API)
from openai import OpenAI
client = OpenAI()
def create_simple_prompt(user_input, style="helpful"):
"""Create a basic prompt with instructions."""
styles = {
"helpful": "You are a helpful assistant.",
"concise": "You are a concise assistant. Keep responses brief.",
"friendly": "You are a friendly, conversational assistant.",
"professional": "You are a professional assistant. Use formal language."
}
return f"{styles.get(style, styles['helpful'])} {user_input}"
def create_structured_prompt(task, context="", constraints=""):
"""Create a structured prompt string."""
parts = []
if context:
parts.append(f"Context: {context}")
parts.append(f"Task: {task}")
if constraints:
parts.append(f"Constraints: {constraints}")
return "\n".join(parts)
# Examples
basic = create_simple_prompt("What is Python?", style="concise")
print("Basic prompt:", basic)
structured = create_structured_prompt(
task="Explain what Python is",
context="Speaking to a beginner programmer",
constraints="Use simple language; avoid jargon"
)
print("\nStructured prompt:", structured)
2. System Messages vs User Messages
Understand different message roles in conversations.
- System messages: Set behavior and context (persona, constraints)
- User messages: The actual queries or tasks
- Conversation continuity (include prior turns as needed)
Using different message roles with Responses API
from openai import OpenAI
client = OpenAI()
def create_conversation(system_prompt, user_query):
"""Create a properly formatted conversation for Responses API."""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_query}
]
return messages
def chat_with_personality(query, personality="neutral"):
personas = {
"neutral": "You are a helpful AI assistant.",
"teacher": "You are a patient teacher who explains concepts clearly.",
"poet": "You are a creative poet who speaks in gentle, lyrical lines.",
"child": "Explain things simply, as if to a 5-year-old."
}
messages = create_conversation(personas[personality], query)
resp = client.responses.create(model="gpt-4o-mini", input=messages, temperature=0.7)
return resp.output_text.strip()
# Test different personalities
query = "What is the sun?"
for persona in ["neutral", "teacher", "poet", "child"]:
print(f"\n{persona.upper()} response:")
print(chat_with_personality(query, persona))
3. Temperature and Parameters
Control response creativity and consistency.
- Temperature: Creativity vs consistency (lower = focused, higher = diverse)
- Other parameters:
max_output_tokens,top_p, frequency/presence penalties - When to use each parameter
- Common parameter combinations
Parameter experimentation
from openai import OpenAI
client = OpenAI()
def explore_temperature(prompt, temps=[0, 0.5, 1.0]):
"""See how temperature affects responses."""
print(f"Prompt: {prompt}\n")
for temp in temps:
print(f"Temperature {temp}:")
resp = client.responses.create(
model="gpt-4o-mini",
input=prompt,
temperature=temp,
max_output_tokens=100
)
print(resp.output_text.strip())
print("-" * 40)
# Try creative vs factual tasks
explore_temperature("Write a creative tagline for a coffee shop")
explore_temperature("What is 2+2?")
Part 3: Building a Simple Chatbot
1. Maintaining Conversation Context
Build a chatbot that remembers previous messages.
- Conversation history: Why context matters
- Following up on previous topics
- Maintaining coherent dialogue
- Memory management: Handling growing conversations
- Token limits force decisions (trim or summarize history)
- Keeping relevant context only
Basic conversation management (Responses API)
from openai import OpenAI
client = OpenAI()
class SimpleChatbot:
def __init__(self, system_prompt="You are a helpful assistant.", max_messages=10):
self.system_prompt = system_prompt
self.messages = [{"role": "system", "content": system_prompt}]
self.max_messages = max_messages # keep last N user/assistant turns
def add_message(self, role, content):
"""Add a message to conversation history."""
self.messages.append({"role": role, "content": content})
# Trim old messages but keep the system prompt intact
if len(self.messages) > self.max_messages + 1:
self.messages = [self.messages[0]] + self.messages[-self.max_messages:]
def get_response(self, user_input, temperature=0.7):
"""Get response from the API."""
# Add user message
self.add_message("user", user_input)
try:
resp = client.responses.create(
model="gpt-4o-mini",
input=self.messages,
temperature=temperature,
# You can set max_output_tokens if you need strict caps
)
ai_text = resp.output_text.strip()
self.add_message("assistant", ai_text)
return ai_text
except Exception as e:
return f"Error: {str(e)}"
def reset(self):
"""Reset conversation."""
self.messages = [{"role": "system", "content": self.system_prompt}]
def get_conversation_length(self):
"""Get current conversation length in messages (excluding system)."""
return len(self.messages) - 1
# Using the chatbot
bot = SimpleChatbot("You are a friendly assistant who loves to help!")
print(bot.get_response("Hello! What's your name?"))
print(bot.get_response("Can you help me learn Python?"))
print(bot.get_response("What should I learn first?"))
2. Command-Line Interface
Create an interactive CLI chatbot.
- User experience: Make CLI interactions smooth
- Clear prompts and formatting
- Handling special commands
- Session management: Start, stop, reset conversations
- Saving conversation history (optional)
- Graceful exit handling
CLI chatbot implementation with streaming
import sys
from openai import OpenAI
client = OpenAI()
class CLIChatbot:
def __init__(self):
self.bot = SimpleChatbot()
self.commands = {
'/help': self.show_help,
'/reset': self.reset_conversation,
'/history': self.show_history,
'/quit': self.quit_chat,
'/exit': self.quit_chat
}
def show_help(self):
print("\nAvailable commands:")
print("/help - Show this help message")
print("/reset - Start a new conversation")
print("/history - Show conversation history")
print("/quit or /exit - Exit the chat\n")
def reset_conversation(self):
self.bot.reset()
print("\n🔄 Conversation reset. Starting fresh!\n")
def show_history(self):
print("\n📜 Conversation History:")
for msg in self.bot.messages[1:]: # Skip system prompt
role = "You" if msg['role'] == 'user' else "AI"
print(f"{role}: {msg['content']}")
print()
def quit_chat(self):
print("\nđź‘‹ Goodbye! Thanks for chatting!")
sys.exit(0)
def run(self):
print("🤖 AI Chatbot")
print("Type '/help' for commands or start chatting!\n")
while True:
try:
user_input = input("You: ").strip()
if not user_input:
continue
if user_input.startswith('/'):
self.commands.get(user_input.lower(), lambda: print("Unknown command. Type '/help'."))()
continue
# Add user message
self.bot.add_message("user", user_input)
print("AI: ", end="", flush=True)
# Stream the response for better UX
with client.responses.stream(
model="gpt-4o-mini",
input=self.bot.messages
) as stream:
streamed_text = []
for event in stream:
if event.type == "response.output_text.delta":
sys.stdout.write(event.delta)
sys.stdout.flush()
streamed_text.append(event.delta)
final = stream.get_final_response()
# Store assistant reply
ai_reply_full = "".join(streamed_text).strip() or final.output_text.strip()
self.bot.add_message("assistant", ai_reply_full)
print("\n")
except KeyboardInterrupt:
self.quit_chat()
except Exception as e:
print(f"\n❌ Error: {e}")
print("Type '/help' for commands or try again.\n")
# Run the chatbot
if __name__ == "__main__":
cli_bot = CLIChatbot()
cli_bot.run()
3. Simple Web Interface
Build a basic web interface using Streamlit.
- Web UI basics: Input, output, and state
- Text input and chat display
- Session state for conversation history
- Deployment considerations: Making it shareable
- Environment variables in web apps
- Basic styling and UX
Streamlit chatbot interface (Responses API)
import streamlit as st
from openai import OpenAI
client = OpenAI()
# Minimal chatbot class (reuse from above if you prefer)
class SimpleChatbot:
def __init__(self, system_prompt="You are a helpful assistant.", max_messages=10):
self.system_prompt = system_prompt
self.messages = [{"role": "system", "content": system_prompt}]
self.max_messages = max_messages
def add_message(self, role, content):
self.messages.append({"role": role, "content": content})
if len(self.messages) > self.max_messages + 1:
self.messages = [self.messages[0]] + self.messages[-self.max_messages:]
def get_response(self, user_input, temperature=0.7):
self.add_message("user", user_input)
try:
resp = client.responses.create(
model="gpt-4o-mini",
input=self.messages,
temperature=temperature
)
ai_text = resp.output_text.strip()
self.add_message("assistant", ai_text)
return ai_text
except Exception as e:
return f"Error: {str(e)}"
def reset(self):
self.messages = [{"role": "system", "content": self.system_prompt}]
def get_conversation_length(self):
return len(self.messages) - 1
# Initialize session state
if 'chatbot' not in st.session_state:
st.session_state.chatbot = SimpleChatbot()
if 'messages' not in st.session_state:
st.session_state.messages = []
# Page config
st.set_page_config(
page_title="AI Chatbot",
page_icon="🤖",
layout="centered"
)
# Title and description
st.title("🤖 AI Chatbot")
st.markdown("Chat with an AI assistant powered by **GPT-4o mini**.")
# Sidebar with options
with st.sidebar:
st.header("Settings")
temperature = st.slider(
"Temperature (Creativity)",
min_value=0.0, max_value=1.0, value=0.7, step=0.1
)
if st.button("🔄 Reset Conversation"):
st.session_state.chatbot.reset()
st.session_state.messages = []
st.rerun()
st.metric("Messages", st.session_state.chatbot.get_conversation_length())
# Display chat history
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.write(message["content"])
# Chat input
user_input = st.chat_input("Type your message...")
if user_input:
st.session_state.messages.append({"role": "user", "content": user_input})
with st.spinner("Thinking..."):
response = st.session_state.chatbot.get_response(user_input)
st.session_state.messages.append({"role": "assistant", "content": response})
st.rerun()
# Footer
st.markdown("---")
st.caption("Built with Streamlit and OpenAI GPT-4o mini")
Part 4: Best Practices and Optimization
1. Cost Management
Understand and control API costs.
- Cost calculation: Pricing is per 1K input/output tokens—do not hard-code; refer to live pricing
- Different model pricing tiers
- Optimization strategies: Reduce tokens without sacrificing quality
- Prompt optimization
- Caching repeated queries
Cost tracking and optimization (supply live rates externally)
class CostTracker:
def __init__(self, input_rate_per_1k: float | None = None, output_rate_per_1k: float | None = None):
"""
Provide current per-1K token rates from the live pricing page.
"""
self.input_rate = input_rate_per_1k
self.output_rate = output_rate_per_1k
self.usage = {
"total_input_tokens": 0,
"total_output_tokens": 0,
"total_cost": 0.0
}
def track_usage(self, input_tokens: int, output_tokens: int):
self.usage["total_input_tokens"] += input_tokens
self.usage["total_output_tokens"] += output_tokens
if self.input_rate is not None and self.output_rate is not None:
cost = (input_tokens / 1000) * self.input_rate + (output_tokens / 1000) * self.output_rate
self.usage["total_cost"] += cost
return {"tokens_used": input_tokens + output_tokens, "cost": round(cost, 6)}
return {"tokens_used": input_tokens + output_tokens, "cost": "Set live rates to compute cost."}
def summary(self):
return {
"Total Input Tokens": self.usage["total_input_tokens"],
"Total Output Tokens": self.usage["total_output_tokens"],
"Total Cost": (f"${self.usage['total_cost']:.6f}" if self.input_rate is not None else "Provide live rates"),
}
def optimize_prompts_tips():
"""Tips for reducing token usage."""
return {
"Be Concise": "Trim unnecessary words and repeated context.",
"Use Roles": "Set behavior once in a system message; avoid restating.",
"Control Length": "Use explicit length constraints and max_output_tokens.",
"Chunking": "Split long inputs; summarize incrementally if needed.",
"Cache": "Cache frequent prompts/answers where appropriate."
}
2. Response Caching
Implement simple caching for common queries.
- When to cache: Factual, repeated, or slow-changing information
- Cache management: Expiration strategies
- Cache size limits
Basic response caching
import json
import hashlib
from datetime import datetime, timedelta
class SimpleCache:
def __init__(self, cache_file="chat_cache.json", expire_hours=24):
self.cache_file = cache_file
self.expire_hours = expire_hours
self.cache = self.load_cache()
def load_cache(self):
try:
with open(self.cache_file, 'r') as f:
return json.load(f)
except:
return {}
def save_cache(self):
with open(self.cache_file, 'w') as f:
json.dump(self.cache, f)
def key(self, prompt: str) -> str:
return hashlib.md5(prompt.encode()).hexdigest()
def get(self, prompt: str):
k = self.key(prompt)
if k in self.cache:
entry = self.cache[k]
ts = datetime.fromisoformat(entry['timestamp'])
if datetime.now() - ts < timedelta(hours=self.expire_hours):
return entry['response']
return None
def set(self, prompt: str, response: str):
k = self.key(prompt)
self.cache[k] = {'response': response, 'timestamp': datetime.now().isoformat()}
self.save_cache()
def clear_expired(self):
now = datetime.now()
expired = [k for k, v in self.cache.items()
if now - datetime.fromisoformat(v['timestamp']) > timedelta(hours=self.expire_hours)]
for k in expired:
del self.cache[k]
self.save_cache()
# Using cache with chatbot
class CachedChatbot(SimpleChatbot):
def __init__(self):
super().__init__()
self.cache = SimpleCache()
def get_response(self, user_input, temperature=0.7):
cached = self.cache.get(user_input)
if cached:
print("[Using cached response]")
return cached
reply = super().get_response(user_input, temperature=temperature)
self.cache.set(user_input, reply)
return reply
Comparative Analysis: Design Patterns
Prompt Engineering Strategies
| Strategy | Use Case | Pros | Cons |
|---|---|---|---|
| Direct Prompting | Simple queries | Fast, straightforward | May lack nuance |
| System Message | Behavior setting | Consistent personality | Uses tokens |
| Few-shot Examples | Specific formats | Clear expectations | More tokens |
| Structured Prompts | Complex tasks | Organized, clear | Requires planning |
Error Handling Approaches
- Simple Retry: Basic reliability for transient errors
- Exponential Backoff: Prevents overwhelming the API
- Circuit Breaker (Advanced): Prevents cascade failures
- Fallback Responses: Maintains user experience
Assessment Readiness Indicators
By completing this module, you should be able to:
- Set up and authenticate with LLM APIs securely
- Construct effective prompts for various use cases
- Build interactive chatbots with context management
- Handle API errors gracefully
- Create both CLI and web interfaces for LLM applications
- Implement basic cost optimization strategies