🟢 〔LLM APIs - LLM & RAG〕 Chatbot Engineering with GPT APIs, Prompting & Simple UIs

LLM API Fundamentals — Chatbot Engineering with GPT APIs, Prompting & Simple UIs

Objective

The objective of this project is to develop foundational skills in working with Large Language Model APIs by building a chatbot application. Learners will understand API authentication, basic prompt construction, request/response handling, and create a functional command-line chat interface that demonstrates core conversational AI concepts.



Learning Outcomes

  • Understand LLM API Fundamentals: Learn how LLM APIs work, including authentication, endpoints, request formatting, and response parsing. Understand tokens, models, and basic parameters.
  • Master Basic Prompt Engineering: Construct effective prompts that elicit desired responses. Learn the importance of clear instructions, context setting, and basic prompt patterns.
  • Implement Simple Conversation Flow: Build a chatbot that maintains context across multiple turns. Understand the role of conversation history in generating coherent responses.
  • Handle API Responses Safely: Parse API responses, handle basic errors, and implement simple retry logic for common failure scenarios.
  • Create User Interfaces: Build both command-line and simple web interfaces for interacting with LLMs, understanding the basics of user input handling and response display.

By achieving these learning outcomes, participants will have a solid foundation for building more complex LLM-powered applications.


Prerequisites

  • Basic Python programming skills
  • Understanding of APIs and HTTP requests
  • Familiarity with JSON data format
  • Basic command-line usage
  • Environment variable management

Skills & Tools

Skills You’ll Develop

  • API Integration: Making HTTP requests, handling responses
  • Prompt Design: Writing clear, effective prompts
  • Error Handling: Basic exception handling and retries
  • State Management: Maintaining conversation context
  • User Interface Design: CLI and simple web interfaces
  • Environment Configuration: API key management

Tools You’ll Master

  • OpenAI Python Library: Official API client
  • python-dotenv: Environment variable management
  • Requests: HTTP library (understanding underlying API calls)
  • Streamlit: Simple web UI framework
  • JSON: Data parsing and formatting



Steps and Tasks

Part 1: Understanding LLM APIs

1. API Basics and Setup

Learn how LLM APIs work and set up your development environment.

  • API concepts: Understanding endpoints, authentication, and rate limits
  • What are tokens and why they matter
  • Different models and their capabilities
  • Cost considerations and token limits (link learners to live pricing rather than hard-coding)
  • Environment setup: Secure API key management
  • Never hardcode API keys
  • Using environment variables safely
Basic API setup and first call (Responses API)
import os
from dotenv import load_dotenv
from openai import OpenAI

# Load environment variables
load_dotenv()

# Initialize OpenAI client (reads OPENAI_API_KEY from env)
client = OpenAI()

def test_api_connection():
    """Test a minimal call using the Responses API."""
    try:
        resp = client.responses.create(
            model="gpt-4o-mini",
            input="Say hello in one short sentence."
        )
        print("API connected successfully!")
        print("Response:", resp.output_text.strip())
        return True
    except Exception as e:
        print(f"API connection failed: {e}")
        return False

# Understanding a response object (shape may evolve; inspect safely)
def explore_response_object():
    resp = client.responses.create(
        model="gpt-4o-mini",
        input="Hi! Summarize yourself in five words."
    )
    print("Top-level fields:", [k for k in resp.__dict__.keys() if not k.startswith('_')])
    print("Output text:", resp.output_text.strip())
Streaming responses (great for chat UIs)
from openai import OpenAI
import sys

client = OpenAI()

def stream_reply(prompt: str, model: str = "gpt-4o-mini"):
    """
    Stream tokens as they arrive. Useful for responsive CLIs/web UIs.
    """
    try:
        with client.responses.stream(
            model=model,
            input=prompt
        ) as stream:
            for event in stream:
                # Text deltas arrive incrementally
                if event.type == "response.output_text.delta":
                    sys.stdout.write(event.delta)
                    sys.stdout.flush()
                # You can handle other event types as needed (tool calls, errors, etc.)
            final = stream.get_final_response()
            print("\n\n[Done]")
            return final.output_text.strip()
    except Exception as e:
        print(f"\n[Stream error] {e}")
        return ""

2. Understanding Tokens

Learn about tokens—the fundamental unit of LLM processing.

  • Token counting: How text translates to tokens
  • Model-specific tokenization and limits
  • Impact on cost and rate limits
  • Managing token limits: Fit prompts within context; plan response space
  • Input vs output tokens
Token counting and management (robust fallback)
import tiktoken

def tokenizer_for(model: str):
    """
    Robust tokenizer selection: use a safe fallback if a new model isn't mapped yet.
    """
    try:
        return tiktoken.encoding_for_model(model)
    except KeyError:
        return tiktoken.get_encoding("o200k_base")

def count_tokens(text: str, model: str = "gpt-4o-mini") -> int:
    encoding = tokenizer_for(model)
    return len(encoding.encode(text))

def estimate_conversation_tokens(messages: list[dict], model: str = "gpt-4o-mini") -> int:
    """
    Rough estimate: count only user/assistant/system contents.
    (SDK may add some overhead formatting tokens internally.)
    """
    total = 0
    for m in messages:
        total += count_tokens(m.get("content", ""), model=model)
    return total

# Examples
print(f"'Hello' uses {count_tokens('Hello')} tokens (gpt-4o-mini approx)")
prompt_tokens = count_tokens("Write a short story about a cat.")
print("Prompt tokens:", prompt_tokens)

3. Basic Error Handling

Handle common API errors gracefully.

  • Common errors: Rate limits, timeouts, invalid requests
  • Understanding error messages
  • Appropriate retry strategies (exponential backoff, jitter)
  • User-friendly messages: Don’t expose stack traces
  • Graceful degradation and fallbacks
Error handling with retries
import time
from typing import List, Dict
from openai import OpenAI

client = OpenAI()

def safe_api_call(messages: List[Dict], max_retries: int = 3, temperature: float = 0.7) -> str:
    """
    Make an API call with basic retry logic.
    Uses Responses API with a messages array (system + user, etc).
    """
    for attempt in range(max_retries):
        try:
            resp = client.responses.create(
                model="gpt-4o-mini",
                input=messages,
                temperature=temperature,
            )
            return resp.output_text.strip()

        except Exception as e:
            # In production, branch on known exception types (timeouts, rate limits) and log details.
            wait_time = min(2 ** attempt, 8)
            print(f"[Warn] API error: {e}. Retrying in {wait_time}s...")
            time.sleep(wait_time)

    return "I'm temporarily unavailable. Please try again shortly."





Part 2: Basic Prompt Engineering

1. Prompt Structure and Components

Learn how to construct effective prompts.

  • Prompt components:

    • Instructions: What you want the AI to do
    • Context: Background information
    • Examples: Desired format or style
    • Constraints: Length, tone, scope
  • Clear communication: Unambiguous phrasing

  • Specific vs vague instructions

  • Impact of prompt clarity on responses

Basic prompt patterns (Responses API)
from openai import OpenAI
client = OpenAI()

def create_simple_prompt(user_input, style="helpful"):
    """Create a basic prompt with instructions."""
    styles = {
        "helpful": "You are a helpful assistant.",
        "concise": "You are a concise assistant. Keep responses brief.",
        "friendly": "You are a friendly, conversational assistant.",
        "professional": "You are a professional assistant. Use formal language."
    }
    return f"{styles.get(style, styles['helpful'])} {user_input}"

def create_structured_prompt(task, context="", constraints=""):
    """Create a structured prompt string."""
    parts = []
    if context:
        parts.append(f"Context: {context}")
    parts.append(f"Task: {task}")
    if constraints:
        parts.append(f"Constraints: {constraints}")
    return "\n".join(parts)

# Examples
basic = create_simple_prompt("What is Python?", style="concise")
print("Basic prompt:", basic)

structured = create_structured_prompt(
    task="Explain what Python is",
    context="Speaking to a beginner programmer",
    constraints="Use simple language; avoid jargon"
)
print("\nStructured prompt:", structured)

2. System Messages vs User Messages

Understand different message roles in conversations.

  • System messages: Set behavior and context (persona, constraints)
  • User messages: The actual queries or tasks
  • Conversation continuity (include prior turns as needed)
Using different message roles with Responses API
from openai import OpenAI
client = OpenAI()

def create_conversation(system_prompt, user_query):
    """Create a properly formatted conversation for Responses API."""
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_query}
    ]
    return messages

def chat_with_personality(query, personality="neutral"):
    personas = {
        "neutral": "You are a helpful AI assistant.",
        "teacher": "You are a patient teacher who explains concepts clearly.",
        "poet": "You are a creative poet who speaks in gentle, lyrical lines.",
        "child": "Explain things simply, as if to a 5-year-old."
    }
    messages = create_conversation(personas[personality], query)
    resp = client.responses.create(model="gpt-4o-mini", input=messages, temperature=0.7)
    return resp.output_text.strip()

# Test different personalities
query = "What is the sun?"
for persona in ["neutral", "teacher", "poet", "child"]:
    print(f"\n{persona.upper()} response:")
    print(chat_with_personality(query, persona))

3. Temperature and Parameters

Control response creativity and consistency.

  • Temperature: Creativity vs consistency (lower = focused, higher = diverse)
  • Other parameters: max_output_tokens, top_p, frequency/presence penalties
  • When to use each parameter
  • Common parameter combinations
Parameter experimentation
from openai import OpenAI
client = OpenAI()

def explore_temperature(prompt, temps=[0, 0.5, 1.0]):
    """See how temperature affects responses."""
    print(f"Prompt: {prompt}\n")
    for temp in temps:
        print(f"Temperature {temp}:")
        resp = client.responses.create(
            model="gpt-4o-mini",
            input=prompt,
            temperature=temp,
            max_output_tokens=100
        )
        print(resp.output_text.strip())
        print("-" * 40)

# Try creative vs factual tasks
explore_temperature("Write a creative tagline for a coffee shop")
explore_temperature("What is 2+2?")





Part 3: Building a Simple Chatbot

1. Maintaining Conversation Context

Build a chatbot that remembers previous messages.

  • Conversation history: Why context matters
  • Following up on previous topics
  • Maintaining coherent dialogue
  • Memory management: Handling growing conversations
  • Token limits force decisions (trim or summarize history)
  • Keeping relevant context only
Basic conversation management (Responses API)
from openai import OpenAI
client = OpenAI()

class SimpleChatbot:
    def __init__(self, system_prompt="You are a helpful assistant.", max_messages=10):
        self.system_prompt = system_prompt
        self.messages = [{"role": "system", "content": system_prompt}]
        self.max_messages = max_messages  # keep last N user/assistant turns

    def add_message(self, role, content):
        """Add a message to conversation history."""
        self.messages.append({"role": role, "content": content})
        # Trim old messages but keep the system prompt intact
        if len(self.messages) > self.max_messages + 1:
            self.messages = [self.messages[0]] + self.messages[-self.max_messages:]

    def get_response(self, user_input, temperature=0.7):
        """Get response from the API."""
        # Add user message
        self.add_message("user", user_input)
        try:
            resp = client.responses.create(
                model="gpt-4o-mini",
                input=self.messages,
                temperature=temperature,
                # You can set max_output_tokens if you need strict caps
            )
            ai_text = resp.output_text.strip()
            self.add_message("assistant", ai_text)
            return ai_text
        except Exception as e:
            return f"Error: {str(e)}"

    def reset(self):
        """Reset conversation."""
        self.messages = [{"role": "system", "content": self.system_prompt}]

    def get_conversation_length(self):
        """Get current conversation length in messages (excluding system)."""
        return len(self.messages) - 1

# Using the chatbot
bot = SimpleChatbot("You are a friendly assistant who loves to help!")
print(bot.get_response("Hello! What's your name?"))
print(bot.get_response("Can you help me learn Python?"))
print(bot.get_response("What should I learn first?"))

2. Command-Line Interface

Create an interactive CLI chatbot.

  • User experience: Make CLI interactions smooth
  • Clear prompts and formatting
  • Handling special commands
  • Session management: Start, stop, reset conversations
  • Saving conversation history (optional)
  • Graceful exit handling
CLI chatbot implementation with streaming
import sys
from openai import OpenAI

client = OpenAI()

class CLIChatbot:
    def __init__(self):
        self.bot = SimpleChatbot()
        self.commands = {
            '/help': self.show_help,
            '/reset': self.reset_conversation,
            '/history': self.show_history,
            '/quit': self.quit_chat,
            '/exit': self.quit_chat
        }

    def show_help(self):
        print("\nAvailable commands:")
        print("/help - Show this help message")
        print("/reset - Start a new conversation")
        print("/history - Show conversation history")
        print("/quit or /exit - Exit the chat\n")

    def reset_conversation(self):
        self.bot.reset()
        print("\n🔄 Conversation reset. Starting fresh!\n")

    def show_history(self):
        print("\n📜 Conversation History:")
        for msg in self.bot.messages[1:]:  # Skip system prompt
            role = "You" if msg['role'] == 'user' else "AI"
            print(f"{role}: {msg['content']}")
        print()

    def quit_chat(self):
        print("\nđź‘‹ Goodbye! Thanks for chatting!")
        sys.exit(0)

    def run(self):
        print("🤖 AI Chatbot")
        print("Type '/help' for commands or start chatting!\n")
        while True:
            try:
                user_input = input("You: ").strip()
                if not user_input:
                    continue

                if user_input.startswith('/'):
                    self.commands.get(user_input.lower(), lambda: print("Unknown command. Type '/help'."))()
                    continue

                # Add user message
                self.bot.add_message("user", user_input)
                print("AI: ", end="", flush=True)

                # Stream the response for better UX
                with client.responses.stream(
                    model="gpt-4o-mini",
                    input=self.bot.messages
                ) as stream:
                    streamed_text = []
                    for event in stream:
                        if event.type == "response.output_text.delta":
                            sys.stdout.write(event.delta)
                            sys.stdout.flush()
                            streamed_text.append(event.delta)
                    final = stream.get_final_response()

                # Store assistant reply
                ai_reply_full = "".join(streamed_text).strip() or final.output_text.strip()
                self.bot.add_message("assistant", ai_reply_full)
                print("\n")

            except KeyboardInterrupt:
                self.quit_chat()
            except Exception as e:
                print(f"\n❌ Error: {e}")
                print("Type '/help' for commands or try again.\n")

# Run the chatbot
if __name__ == "__main__":
    cli_bot = CLIChatbot()
    cli_bot.run()

3. Simple Web Interface

Build a basic web interface using Streamlit.

  • Web UI basics: Input, output, and state
  • Text input and chat display
  • Session state for conversation history
  • Deployment considerations: Making it shareable
  • Environment variables in web apps
  • Basic styling and UX
Streamlit chatbot interface (Responses API)
import streamlit as st
from openai import OpenAI

client = OpenAI()

# Minimal chatbot class (reuse from above if you prefer)
class SimpleChatbot:
    def __init__(self, system_prompt="You are a helpful assistant.", max_messages=10):
        self.system_prompt = system_prompt
        self.messages = [{"role": "system", "content": system_prompt}]
        self.max_messages = max_messages

    def add_message(self, role, content):
        self.messages.append({"role": role, "content": content})
        if len(self.messages) > self.max_messages + 1:
            self.messages = [self.messages[0]] + self.messages[-self.max_messages:]

    def get_response(self, user_input, temperature=0.7):
        self.add_message("user", user_input)
        try:
            resp = client.responses.create(
                model="gpt-4o-mini",
                input=self.messages,
                temperature=temperature
            )
            ai_text = resp.output_text.strip()
            self.add_message("assistant", ai_text)
            return ai_text
        except Exception as e:
            return f"Error: {str(e)}"

    def reset(self):
        self.messages = [{"role": "system", "content": self.system_prompt}]

    def get_conversation_length(self):
        return len(self.messages) - 1

# Initialize session state
if 'chatbot' not in st.session_state:
    st.session_state.chatbot = SimpleChatbot()
if 'messages' not in st.session_state:
    st.session_state.messages = []

# Page config
st.set_page_config(
    page_title="AI Chatbot",
    page_icon="🤖",
    layout="centered"
)

# Title and description
st.title("🤖 AI Chatbot")
st.markdown("Chat with an AI assistant powered by **GPT-4o mini**.")

# Sidebar with options
with st.sidebar:
    st.header("Settings")
    temperature = st.slider(
        "Temperature (Creativity)",
        min_value=0.0, max_value=1.0, value=0.7, step=0.1
    )
    if st.button("🔄 Reset Conversation"):
        st.session_state.chatbot.reset()
        st.session_state.messages = []
        st.rerun()
    st.metric("Messages", st.session_state.chatbot.get_conversation_length())

# Display chat history
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.write(message["content"])

# Chat input
user_input = st.chat_input("Type your message...")
if user_input:
    st.session_state.messages.append({"role": "user", "content": user_input})
    with st.spinner("Thinking..."):
        response = st.session_state.chatbot.get_response(user_input)
    st.session_state.messages.append({"role": "assistant", "content": response})
    st.rerun()

# Footer
st.markdown("---")
st.caption("Built with Streamlit and OpenAI GPT-4o mini")





Part 4: Best Practices and Optimization

1. Cost Management

Understand and control API costs.

  • Cost calculation: Pricing is per 1K input/output tokens—do not hard-code; refer to live pricing
  • Different model pricing tiers
  • Optimization strategies: Reduce tokens without sacrificing quality
  • Prompt optimization
  • Caching repeated queries
Cost tracking and optimization (supply live rates externally)
class CostTracker:
    def __init__(self, input_rate_per_1k: float | None = None, output_rate_per_1k: float | None = None):
        """
        Provide current per-1K token rates from the live pricing page.
        """
        self.input_rate = input_rate_per_1k
        self.output_rate = output_rate_per_1k
        self.usage = {
            "total_input_tokens": 0,
            "total_output_tokens": 0,
            "total_cost": 0.0
        }

    def track_usage(self, input_tokens: int, output_tokens: int):
        self.usage["total_input_tokens"] += input_tokens
        self.usage["total_output_tokens"] += output_tokens
        if self.input_rate is not None and self.output_rate is not None:
            cost = (input_tokens / 1000) * self.input_rate + (output_tokens / 1000) * self.output_rate
            self.usage["total_cost"] += cost
            return {"tokens_used": input_tokens + output_tokens, "cost": round(cost, 6)}
        return {"tokens_used": input_tokens + output_tokens, "cost": "Set live rates to compute cost."}

    def summary(self):
        return {
            "Total Input Tokens": self.usage["total_input_tokens"],
            "Total Output Tokens": self.usage["total_output_tokens"],
            "Total Cost": (f"${self.usage['total_cost']:.6f}" if self.input_rate is not None else "Provide live rates"),
        }

def optimize_prompts_tips():
    """Tips for reducing token usage."""
    return {
        "Be Concise": "Trim unnecessary words and repeated context.",
        "Use Roles": "Set behavior once in a system message; avoid restating.",
        "Control Length": "Use explicit length constraints and max_output_tokens.",
        "Chunking": "Split long inputs; summarize incrementally if needed.",
        "Cache": "Cache frequent prompts/answers where appropriate."
    }

2. Response Caching

Implement simple caching for common queries.

  • When to cache: Factual, repeated, or slow-changing information
  • Cache management: Expiration strategies
  • Cache size limits
Basic response caching
import json
import hashlib
from datetime import datetime, timedelta

class SimpleCache:
    def __init__(self, cache_file="chat_cache.json", expire_hours=24):
        self.cache_file = cache_file
        self.expire_hours = expire_hours
        self.cache = self.load_cache()

    def load_cache(self):
        try:
            with open(self.cache_file, 'r') as f:
                return json.load(f)
        except:
            return {}

    def save_cache(self):
        with open(self.cache_file, 'w') as f:
            json.dump(self.cache, f)

    def key(self, prompt: str) -> str:
        return hashlib.md5(prompt.encode()).hexdigest()

    def get(self, prompt: str):
        k = self.key(prompt)
        if k in self.cache:
            entry = self.cache[k]
            ts = datetime.fromisoformat(entry['timestamp'])
            if datetime.now() - ts < timedelta(hours=self.expire_hours):
                return entry['response']
        return None

    def set(self, prompt: str, response: str):
        k = self.key(prompt)
        self.cache[k] = {'response': response, 'timestamp': datetime.now().isoformat()}
        self.save_cache()

    def clear_expired(self):
        now = datetime.now()
        expired = [k for k, v in self.cache.items()
                   if now - datetime.fromisoformat(v['timestamp']) > timedelta(hours=self.expire_hours)]
        for k in expired:
            del self.cache[k]
        self.save_cache()

# Using cache with chatbot
class CachedChatbot(SimpleChatbot):
    def __init__(self):
        super().__init__()
        self.cache = SimpleCache()

    def get_response(self, user_input, temperature=0.7):
        cached = self.cache.get(user_input)
        if cached:
            print("[Using cached response]")
            return cached
        reply = super().get_response(user_input, temperature=temperature)
        self.cache.set(user_input, reply)
        return reply





Comparative Analysis: Design Patterns

Prompt Engineering Strategies

Strategy Use Case Pros Cons
Direct Prompting Simple queries Fast, straightforward May lack nuance
System Message Behavior setting Consistent personality Uses tokens
Few-shot Examples Specific formats Clear expectations More tokens
Structured Prompts Complex tasks Organized, clear Requires planning

Error Handling Approaches

  • Simple Retry: Basic reliability for transient errors
  • Exponential Backoff: Prevents overwhelming the API
  • Circuit Breaker (Advanced): Prevents cascade failures
  • Fallback Responses: Maintains user experience

Assessment Readiness Indicators

By completing this module, you should be able to:

  • Set up and authenticate with LLM APIs securely
  • Construct effective prompts for various use cases
  • Build interactive chatbots with context management
  • Handle API errors gracefully
  • Create both CLI and web interfaces for LLM applications
  • Implement basic cost optimization strategies