🟢 〔LLM APIs - LLM & RAG〕 Chatbot Engineering with GPT APIs, Prompting & Simple UIs

stemaway · February 12, 2026, 4:32am

LLM API Fundamentals — Chatbot Engineering with GPT APIs, Prompting & Simple UIs

Objective

The objective of this project is to develop foundational skills in working with Large Language Model APIs by building a chatbot application. Learners will understand API authentication, basic prompt construction, request/response handling, and create a functional command-line chat interface that demonstrates core conversational AI concepts.

Learning Outcomes

Understand LLM API Fundamentals: Learn how LLM APIs work, including authentication, endpoints, request formatting, and response parsing. Understand tokens, models, and basic parameters.
Master Basic Prompt Engineering: Construct effective prompts that elicit desired responses. Learn the importance of clear instructions, context setting, and basic prompt patterns.
Implement Simple Conversation Flow: Build a chatbot that maintains context across multiple turns. Understand the role of conversation history in generating coherent responses.
Handle API Responses Safely: Parse API responses, handle basic errors, and implement simple retry logic for common failure scenarios.
Create User Interfaces: Build both command-line and simple web interfaces for interacting with LLMs, understanding the basics of user input handling and response display.

By achieving these learning outcomes, participants will have a solid foundation for building more complex LLM-powered applications.

Prerequisites

Basic Python programming skills
Understanding of APIs and HTTP requests
Familiarity with JSON data format
Basic command-line usage
Environment variable management

Skills & Tools

Skills You’ll Develop

API Integration: Making HTTP requests, handling responses
Prompt Design: Writing clear, effective prompts
Error Handling: Basic exception handling and retries
State Management: Maintaining conversation context
User Interface Design: CLI and simple web interfaces
Environment Configuration: API key management

Tools You’ll Master

OpenAI Python Library: Official API client
python-dotenv: Environment variable management
Requests: HTTP library (understanding underlying API calls)
Streamlit: Simple web UI framework
JSON: Data parsing and formatting

Steps and Tasks

Part 1: Understanding LLM APIs

1. API Basics and Setup

Learn how LLM APIs work and set up your development environment.

API concepts: Understanding endpoints, authentication, and rate limits
What are tokens and why they matter
Different models and their capabilities
Cost considerations and token limits (link learners to live pricing rather than hard-coding)
Environment setup: Secure API key management
Never hardcode API keys
Using environment variables safely

Basic API setup and first call (Responses API)

import os
from dotenv import load_dotenv
from openai import OpenAI

# Load environment variables
load_dotenv()

# Initialize OpenAI client (reads OPENAI_API_KEY from env)
client = OpenAI()

def test_api_connection():
    """Test a minimal call using the Responses API."""
    try:
        resp = client.responses.create(
            model="gpt-4o-mini",
            input="Say hello in one short sentence."
        )
        print("API connected successfully!")
        print("Response:", resp.output_text.strip())
        return True
    except Exception as e:
        print(f"API connection failed: {e}")
        return False

# Understanding a response object (shape may evolve; inspect safely)
def explore_response_object():
    resp = client.responses.create(
        model="gpt-4o-mini",
        input="Hi! Summarize yourself in five words."
    )
    print("Top-level fields:", [k for k in resp.__dict__.keys() if not k.startswith('_')])
    print("Output text:", resp.output_text.strip())

Streaming responses (great for chat UIs)

from openai import OpenAI
import sys

client = OpenAI()

def stream_reply(prompt: str, model: str = "gpt-4o-mini"):
    """
    Stream tokens as they arrive. Useful for responsive CLIs/web UIs.
    """
    try:
        with client.responses.stream(
            model=model,
            input=prompt
        ) as stream:
            for event in stream:
                # Text deltas arrive incrementally
                if event.type == "response.output_text.delta":
                    sys.stdout.write(event.delta)
                    sys.stdout.flush()
                # You can handle other event types as needed (tool calls, errors, etc.)
            final = stream.get_final_response()
            print("\n\n[Done]")
            return final.output_text.strip()
    except Exception as e:
        print(f"\n[Stream error] {e}")
        return ""

2. Understanding Tokens

Learn about tokens—the fundamental unit of LLM processing.

Token counting: How text translates to tokens
Model-specific tokenization and limits
Impact on cost and rate limits
Managing token limits: Fit prompts within context; plan response space
Input vs output tokens

Token counting and management (robust fallback)

import tiktoken

def tokenizer_for(model: str):
    """
    Robust tokenizer selection: use a safe fallback if a new model isn't mapped yet.
    """
    try:
        return tiktoken.encoding_for_model(model)
    except KeyError:
        return tiktoken.get_encoding("o200k_base")

def count_tokens(text: str, model: str = "gpt-4o-mini") -> int:
    encoding = tokenizer_for(model)
    return len(encoding.encode(text))

def estimate_conversation_tokens(messages: list[dict], model: str = "gpt-4o-mini") -> int:
    """
    Rough estimate: count only user/assistant/system contents.
    (SDK may add some overhead formatting tokens internally.)
    """
    total = 0
    for m in messages:
        total += count_tokens(m.get("content", ""), model=model)
    return total

# Examples
print(f"'Hello' uses {count_tokens('Hello')} tokens (gpt-4o-mini approx)")
prompt_tokens = count_tokens("Write a short story about a cat.")
print("Prompt tokens:", prompt_tokens)

3. Basic Error Handling

Handle common API errors gracefully.

Common errors: Rate limits, timeouts, invalid requests
Understanding error messages
Appropriate retry strategies (exponential backoff, jitter)
User-friendly messages: Don’t expose stack traces
Graceful degradation and fallbacks

Error handling with retries

import time
from typing import List, Dict
from openai import OpenAI

client = OpenAI()

def safe_api_call(messages: List[Dict], max_retries: int = 3, temperature: float = 0.7) -> str:
    """
    Make an API call with basic retry logic.
    Uses Responses API with a messages array (system + user, etc).
    """
    for attempt in range(max_retries):
        try:
            resp = client.responses.create(
                model="gpt-4o-mini",
                input=messages,
                temperature=temperature,
            )
            return resp.output_text.strip()

        except Exception as e:
            # In production, branch on known exception types (timeouts, rate limits) and log details.
            wait_time = min(2 ** attempt, 8)
            print(f"[Warn] API error: {e}. Retrying in {wait_time}s...")
            time.sleep(wait_time)

    return "I'm temporarily unavailable. Please try again shortly."

Part 2: Basic Prompt Engineering

1. Prompt Structure and Components

Learn how to construct effective prompts.

Prompt components:
- Instructions: What you want the AI to do
- Context: Background information
- Examples: Desired format or style
- Constraints: Length, tone, scope
Clear communication: Unambiguous phrasing
Specific vs vague instructions
Impact of prompt clarity on responses

Basic prompt patterns (Responses API)

from openai import OpenAI
client = OpenAI()

def create_simple_prompt(user_input, style="helpful"):
    """Create a basic prompt with instructions."""
    styles = {
        "helpful": "You are a helpful assistant.",
        "concise": "You are a concise assistant. Keep responses brief.",
        "friendly": "You are a friendly, conversational assistant.",
        "professional": "You are a professional assistant. Use formal language."
    }
    return f"{styles.get(style, styles['helpful'])} {user_input}"

def create_structured_prompt(task, context="", constraints=""):
    """Create a structured prompt string."""
    parts = []
    if context:
        parts.append(f"Context: {context}")
    parts.append(f"Task: {task}")
    if constraints:
        parts.append(f"Constraints: {constraints}")
    return "\n".join(parts)

# Examples
basic = create_simple_prompt("What is Python?", style="concise")
print("Basic prompt:", basic)

structured = create_structured_prompt(
    task="Explain what Python is",
    context="Speaking to a beginner programmer",
    constraints="Use simple language; avoid jargon"
)
print("\nStructured prompt:", structured)

2. System Messages vs User Messages

Understand different message roles in conversations.

System messages: Set behavior and context (persona, constraints)
User messages: The actual queries or tasks
Conversation continuity (include prior turns as needed)

Using different message roles with Responses API

from openai import OpenAI
client = OpenAI()

def create_conversation(system_prompt, user_query):
    """Create a properly formatted conversation for Responses API."""
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_query}
    ]
    return messages

def chat_with_personality(query, personality="neutral"):
    personas = {
        "neutral": "You are a helpful AI assistant.",
        "teacher": "You are a patient teacher who explains concepts clearly.",
        "poet": "You are a creative poet who speaks in gentle, lyrical lines.",
        "child": "Explain things simply, as if to a 5-year-old."
    }
    messages = create_conversation(personas[personality], query)
    resp = client.responses.create(model="gpt-4o-mini", input=messages, temperature=0.7)
    return resp.output_text.strip()

# Test different personalities
query = "What is the sun?"
for persona in ["neutral", "teacher", "poet", "child"]:
    print(f"\n{persona.upper()} response:")
    print(chat_with_personality(query, persona))

3. Temperature and Parameters

Control response creativity and consistency.

Temperature: Creativity vs consistency (lower = focused, higher = diverse)
Other parameters: max_output_tokens, top_p, frequency/presence penalties
When to use each parameter
Common parameter combinations

Parameter experimentation

from openai import OpenAI
client = OpenAI()

def explore_temperature(prompt, temps=[0, 0.5, 1.0]):
    """See how temperature affects responses."""
    print(f"Prompt: {prompt}\n")
    for temp in temps:
        print(f"Temperature {temp}:")
        resp = client.responses.create(
            model="gpt-4o-mini",
            input=prompt,
            temperature=temp,
            max_output_tokens=100
        )
        print(resp.output_text.strip())
        print("-" * 40)

# Try creative vs factual tasks
explore_temperature("Write a creative tagline for a coffee shop")
explore_temperature("What is 2+2?")

Part 3: Building a Simple Chatbot

1. Maintaining Conversation Context

Build a chatbot that remembers previous messages.

Conversation history: Why context matters
Following up on previous topics
Maintaining coherent dialogue
Memory management: Handling growing conversations
Token limits force decisions (trim or summarize history)
Keeping relevant context only

Basic conversation management (Responses API)

from openai import OpenAI
client = OpenAI()

class SimpleChatbot:
    def __init__(self, system_prompt="You are a helpful assistant.", max_messages=10):
        self.system_prompt = system_prompt
        self.messages = [{"role": "system", "content": system_prompt}]
        self.max_messages = max_messages  # keep last N user/assistant turns

    def add_message(self, role, content):
        """Add a message to conversation history."""
        self.messages.append({"role": role, "content": content})
        # Trim old messages but keep the system prompt intact
        if len(self.messages) > self.max_messages + 1:
            self.messages = [self.messages[0]] + self.messages[-self.max_messages:]

    def get_response(self, user_input, temperature=0.7):
        """Get response from the API."""
        # Add user message
        self.add_message("user", user_input)
        try:
            resp = client.responses.create(
                model="gpt-4o-mini",
                input=self.messages,
                temperature=temperature,
                # You can set max_output_tokens if you need strict caps
            )
            ai_text = resp.output_text.strip()
            self.add_message("assistant", ai_text)
            return ai_text
        except Exception as e:
            return f"Error: {str(e)}"

    def reset(self):
        """Reset conversation."""
        self.messages = [{"role": "system", "content": self.system_prompt}]

    def get_conversation_length(self):
        """Get current conversation length in messages (excluding system)."""
        return len(self.messages) - 1

# Using the chatbot
bot = SimpleChatbot("You are a friendly assistant who loves to help!")
print(bot.get_response("Hello! What's your name?"))
print(bot.get_response("Can you help me learn Python?"))
print(bot.get_response("What should I learn first?"))

2. Command-Line Interface

Create an interactive CLI chatbot.

User experience: Make CLI interactions smooth
Clear prompts and formatting
Handling special commands
Session management: Start, stop, reset conversations
Saving conversation history (optional)
Graceful exit handling

CLI chatbot implementation with streaming

import sys
from openai import OpenAI

client = OpenAI()

class CLIChatbot:
    def __init__(self):
        self.bot = SimpleChatbot()
        self.commands = {
            '/help': self.show_help,
            '/reset': self.reset_conversation,
            '/history': self.show_history,
            '/quit': self.quit_chat,
            '/exit': self.quit_chat
        }

    def show_help(self):
        print("\nAvailable commands:")
        print("/help - Show this help message")
        print("/reset - Start a new conversation")
        print("/history - Show conversation history")
        print("/quit or /exit - Exit the chat\n")

    def reset_conversation(self):
        self.bot.reset()
        print("\n🔄 Conversation reset. Starting fresh!\n")

    def show_history(self):
        print("\n📜 Conversation History:")
        for msg in self.bot.messages[1:]:  # Skip system prompt
            role = "You" if msg['role'] == 'user' else "AI"
            print(f"{role}: {msg['content']}")
        print()

    def quit_chat(self):
        print("\n👋 Goodbye! Thanks for chatting!")
        sys.exit(0)

    def run(self):
        print("🤖 AI Chatbot")
        print("Type '/help' for commands or start chatting!\n")
        while True:
            try:
                user_input = input("You: ").strip()
                if not user_input:
                    continue

                if user_input.startswith('/'):
                    self.commands.get(user_input.lower(), lambda: print("Unknown command. Type '/help'."))()
                    continue

                # Add user message
                self.bot.add_message("user", user_input)
                print("AI: ", end="", flush=True)

                # Stream the response for better UX
                with client.responses.stream(
                    model="gpt-4o-mini",
                    input=self.bot.messages
                ) as stream:
                    streamed_text = []
                    for event in stream:
                        if event.type == "response.output_text.delta":
                            sys.stdout.write(event.delta)
                            sys.stdout.flush()
                            streamed_text.append(event.delta)
                    final = stream.get_final_response()

                # Store assistant reply
                ai_reply_full = "".join(streamed_text).strip() or final.output_text.strip()
                self.bot.add_message("assistant", ai_reply_full)
                print("\n")

            except KeyboardInterrupt:
                self.quit_chat()
            except Exception as e:
                print(f"\n❌ Error: {e}")
                print("Type '/help' for commands or try again.\n")

# Run the chatbot
if __name__ == "__main__":
    cli_bot = CLIChatbot()
    cli_bot.run()

3. Simple Web Interface

Build a basic web interface using Streamlit.

Web UI basics: Input, output, and state
Text input and chat display
Session state for conversation history
Deployment considerations: Making it shareable
Environment variables in web apps
Basic styling and UX

Streamlit chatbot interface (Responses API)

import streamlit as st
from openai import OpenAI

client = OpenAI()

# Minimal chatbot class (reuse from above if you prefer)
class SimpleChatbot:
    def __init__(self, system_prompt="You are a helpful assistant.", max_messages=10):
        self.system_prompt = system_prompt
        self.messages = [{"role": "system", "content": system_prompt}]
        self.max_messages = max_messages

    def add_message(self, role, content):
        self.messages.append({"role": role, "content": content})
        if len(self.messages) > self.max_messages + 1:
            self.messages = [self.messages[0]] + self.messages[-self.max_messages:]

    def get_response(self, user_input, temperature=0.7):
        self.add_message("user", user_input)
        try:
            resp = client.responses.create(
                model="gpt-4o-mini",
                input=self.messages,
                temperature=temperature
            )
            ai_text = resp.output_text.strip()
            self.add_message("assistant", ai_text)
            return ai_text
        except Exception as e:
            return f"Error: {str(e)}"

    def reset(self):
        self.messages = [{"role": "system", "content": self.system_prompt}]

    def get_conversation_length(self):
        return len(self.messages) - 1

# Initialize session state
if 'chatbot' not in st.session_state:
    st.session_state.chatbot = SimpleChatbot()
if 'messages' not in st.session_state:
    st.session_state.messages = []

# Page config
st.set_page_config(
    page_title="AI Chatbot",
    page_icon="🤖",
    layout="centered"
)

# Title and description
st.title("🤖 AI Chatbot")
st.markdown("Chat with an AI assistant powered by **GPT-4o mini**.")

# Sidebar with options
with st.sidebar:
    st.header("Settings")
    temperature = st.slider(
        "Temperature (Creativity)",
        min_value=0.0, max_value=1.0, value=0.7, step=0.1
    )
    if st.button("🔄 Reset Conversation"):
        st.session_state.chatbot.reset()
        st.session_state.messages = []
        st.rerun()
    st.metric("Messages", st.session_state.chatbot.get_conversation_length())

# Display chat history
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.write(message["content"])

# Chat input
user_input = st.chat_input("Type your message...")
if user_input:
    st.session_state.messages.append({"role": "user", "content": user_input})
    with st.spinner("Thinking..."):
        response = st.session_state.chatbot.get_response(user_input)
    st.session_state.messages.append({"role": "assistant", "content": response})
    st.rerun()

# Footer
st.markdown("---")
st.caption("Built with Streamlit and OpenAI GPT-4o mini")

Part 4: Best Practices and Optimization

1. Cost Management

Understand and control API costs.

Cost calculation: Pricing is per 1K input/output tokens—do not hard-code; refer to live pricing
Different model pricing tiers
Optimization strategies: Reduce tokens without sacrificing quality
Prompt optimization
Caching repeated queries

Cost tracking and optimization (supply live rates externally)

class CostTracker:
    def __init__(self, input_rate_per_1k: float | None = None, output_rate_per_1k: float | None = None):
        """
        Provide current per-1K token rates from the live pricing page.
        """
        self.input_rate = input_rate_per_1k
        self.output_rate = output_rate_per_1k
        self.usage = {
            "total_input_tokens": 0,
            "total_output_tokens": 0,
            "total_cost": 0.0
        }

    def track_usage(self, input_tokens: int, output_tokens: int):
        self.usage["total_input_tokens"] += input_tokens
        self.usage["total_output_tokens"] += output_tokens
        if self.input_rate is not None and self.output_rate is not None:
            cost = (input_tokens / 1000) * self.input_rate + (output_tokens / 1000) * self.output_rate
            self.usage["total_cost"] += cost
            return {"tokens_used": input_tokens + output_tokens, "cost": round(cost, 6)}
        return {"tokens_used": input_tokens + output_tokens, "cost": "Set live rates to compute cost."}

    def summary(self):
        return {
            "Total Input Tokens": self.usage["total_input_tokens"],
            "Total Output Tokens": self.usage["total_output_tokens"],
            "Total Cost": (f"${self.usage['total_cost']:.6f}" if self.input_rate is not None else "Provide live rates"),
        }

def optimize_prompts_tips():
    """Tips for reducing token usage."""
    return {
        "Be Concise": "Trim unnecessary words and repeated context.",
        "Use Roles": "Set behavior once in a system message; avoid restating.",
        "Control Length": "Use explicit length constraints and max_output_tokens.",
        "Chunking": "Split long inputs; summarize incrementally if needed.",
        "Cache": "Cache frequent prompts/answers where appropriate."
    }

2. Response Caching

Implement simple caching for common queries.

When to cache: Factual, repeated, or slow-changing information
Cache management: Expiration strategies
Cache size limits

Basic response caching

import json
import hashlib
from datetime import datetime, timedelta

class SimpleCache:
    def __init__(self, cache_file="chat_cache.json", expire_hours=24):
        self.cache_file = cache_file
        self.expire_hours = expire_hours
        self.cache = self.load_cache()

    def load_cache(self):
        try:
            with open(self.cache_file, 'r') as f:
                return json.load(f)
        except:
            return {}

    def save_cache(self):
        with open(self.cache_file, 'w') as f:
            json.dump(self.cache, f)

    def key(self, prompt: str) -> str:
        return hashlib.md5(prompt.encode()).hexdigest()

    def get(self, prompt: str):
        k = self.key(prompt)
        if k in self.cache:
            entry = self.cache[k]
            ts = datetime.fromisoformat(entry['timestamp'])
            if datetime.now() - ts < timedelta(hours=self.expire_hours):
                return entry['response']
        return None

    def set(self, prompt: str, response: str):
        k = self.key(prompt)
        self.cache[k] = {'response': response, 'timestamp': datetime.now().isoformat()}
        self.save_cache()

    def clear_expired(self):
        now = datetime.now()
        expired = [k for k, v in self.cache.items()
                   if now - datetime.fromisoformat(v['timestamp']) > timedelta(hours=self.expire_hours)]
        for k in expired:
            del self.cache[k]
        self.save_cache()

# Using cache with chatbot
class CachedChatbot(SimpleChatbot):
    def __init__(self):
        super().__init__()
        self.cache = SimpleCache()

    def get_response(self, user_input, temperature=0.7):
        cached = self.cache.get(user_input)
        if cached:
            print("[Using cached response]")
            return cached
        reply = super().get_response(user_input, temperature=temperature)
        self.cache.set(user_input, reply)
        return reply

Comparative Analysis: Design Patterns

Prompt Engineering Strategies

Strategy	Use Case	Pros	Cons
Direct Prompting	Simple queries	Fast, straightforward	May lack nuance
System Message	Behavior setting	Consistent personality	Uses tokens
Few-shot Examples	Specific formats	Clear expectations	More tokens
Structured Prompts	Complex tasks	Organized, clear	Requires planning

Error Handling Approaches

Simple Retry: Basic reliability for transient errors
Exponential Backoff: Prevents overwhelming the API
Circuit Breaker (Advanced): Prevents cascade failures
Fallback Responses: Maintains user experience

Assessment Readiness Indicators

By completing this module, you should be able to:

Set up and authenticate with LLM APIs securely
Construct effective prompts for various use cases
Build interactive chatbots with context management
Handle API errors gracefully
Create both CLI and web interfaces for LLM applications
Implement basic cost optimization strategies