Module 1 — Foundations of Generative and Agentic AI

Textbook-style chapter compiled from provided course materials. Where “Editorial additions” appear, they are clearly labeled.

Generative AI Fundamentals Chatbots → Reasoning → Agents Tokens & Cost Trade-offs Multimodal (Text • Audio • Images) References Included

Tools & Platforms Mentioned

Logos are loaded from a lightweight icon CDN. If any are blocked by your network, the text labels will still display.

OpenAI logo OpenAI
Anthropic logo Anthropic
Adobe logo Adobe
Microsoft logo Microsoft
ElevenLabs logo ElevenLabs
Google logo Google

Learning objectives and key terms

Module objectives (from the Course Guide):

  • Understand the evolution and landscape of generative AI
  • Familiarize yourself with the terminology and categories of AI models
  • Recognize the strategic value of different AI functionalities (e.g., chatbots, reasoning, and multimedia)

Key terms (from this module’s content):

ELIZA; rule-based chatbot; keyword detection; natural language processing (NLP); machine learning (ML); conversational agent; transformer; large language model (LLM); generative AI; reasoning model; chain-of-thought prompting; hallucination; tokens; latency; context length; ASR; TTS; NLU; voice biometrics; GAN; diffusion model; self-attention; multimodal models; fine-tuning; DreamBooth; LoRA; APIs; compliance (GDPR/CCPA/HIPAA/LGPD).

How to read this chapter

This module is written for leaders who need “working fluency”—enough depth to make responsible strategic decisions, ask better questions, and evaluate trade-offs without becoming a specialist.

1. Generative AI Fundamentals (orientation)

This module positions generative AI as a practical capability—systems that can produce text, images, audio, and more— and links those capabilities to organizational performance and digital transformation. The later sections unpack how we arrived here (chatbots), what makes AI expensive (cost drivers), and how new modalities (audio and images) reshape workflows.

Editorial addition (organization)

The Course Guide lists “Generative AI Fundamentals” as Section 1 of Module 1. The provided Module 1 text begins with chatbot evolution and then expands into cost, multimedia, and advanced tools. This short orientation connects the given content to the official outline.

2. AI Chatbots: Past, Present, and Future

2.1 From ELIZA to modern generative chatbots

Chatbot capabilities have evolved significantly since MIT professor Joseph Weizenbaum created ELIZA, the world’s first chatbot, in 1966. Early systems were rule-based: they detected keywords and returned pre-scripted responses. These systems lacked NLP capabilities and were limited in scope and output (Murphy, 2023).

ELIZA conversation screenshot
Figure 2.1. A sample ELIZA-style conversation (historical screenshot). Source: Wikimedia Commons link embedded.
Figure 2.2. Evolution of chatbot capabilities (concept map)
1966: ELIZA Pattern matching Scripted responses Rule-based chatbots Keyword triggers FAQ / support flows Early 2010s: Conversational agents ML improves language handling Siri, Alexa, Watson Late 2010s: Transformers & LLMs Generative responses More natural + scalable Now: Reasoning + Agentic AI Multi-step problem solving More autonomy
This figure is an original diagram created for this textbook chapter (no external source).

2.2 Conversational agents and the generative shift

In the early 2010s, machine learning (ML) advancements enabled a new generation of chatbots—conversational agents— that better understood natural language and could complete more complex tasks (Murphy, 2023). Examples include IBM Watson, Siri, and Alexa. Developments in the late 2010s—transformer-based neural networks and large language models (LLMs)—paved the way for generative AI chatbots that can handle larger query volumes and deliver more personalized, natural-sounding responses (Marr, 2024).

IBM Watson prototype photo
Figure 2.3. IBM Watson prototype (photo). Source: Wikimedia Commons link embedded.

2.3 Reasoning models, chain-of-thought, and limitations

Reasoning models (e.g., OpenAI’s o3 and o4 models) represent a more recent milestone. These models are trained to spend more time processing queries, “thinking through” problems before responding, like a human analyst would (Williams, 2025). They have demonstrated improvements on tasks requiring complex reasoning in areas such as science, coding, and math (Paul & Tong, 2024).

Definition — Chain-of-thought prompting

Chain-of-thought prompting is designed to improve the ability of LLMs to perform complex reasoning. It involves generating intermediate natural language reasoning steps that lead to a final answer, simulating a human-like thought process.

Example (from module content): For “Market A vs. Market B,” a chain-of-thought-enabled model would analyze factors separately—market size, competition, regulatory environment—before recommending a direction.

Risk note — hallucination

The module notes that hallucination remains an inherent risk of LLMs: models may generate outputs not grounded in training data or recognized patterns, producing false or inaccurate claims. The module also notes an OpenAI study in which o4-mini hallucinated more than earlier ChatGPT models on certain metrics.

2.4 Agentic AI (bridge to Module 2)

Unlike traditional chatbots that primarily react to prompts, agentic AI can take action autonomously and proactively, adapt to context, and execute goals in complex environments with minimal human intervention (Coshow et al., 2025). According to MIT’s Dr. Abel Sanchez, an AI agent is essentially a workflow with tasks that may involve humans.

Illustrative use cases (from module content):

  • Automate customer experiences
  • Create and post content for an advertising campaign
  • Provide proactive sales intelligence and recommend next steps (e.g., upselling)
  • Enable security systems that monitor, report, and act on their own initiative
  • Automate supply chains and planning
Strategic applications: transforming the customer experience

The module argues that customer-facing chatbots now handle more queries with higher accuracy and nuance, and can offer increasingly personalized responses based on customer data and prior interactions (Marr, 2024). It also suggests that AI customer-experience agents may allow organizations to automate a significant share of customer interactions while boosting engagement (Coshow et al., 2025).

Case study — Klarna (as provided)

In 2024, Klarna adopted an AI customer service assistant powered by OpenAI. The chatbot reportedly handled a workload equivalent to 700 full-time agents in its first month. Repeat inquiries fell by 25%, and the average service time was two minutes versus 11 minutes with human agents.

Case study — Octopus Energy (as provided)

Octopus Energy integrated ChatGPT into its customer service channel and assigned responsibility for handling inquiries. According to the company, the system handles the work of 250 people and has received higher average customer satisfaction ratings than human agents.

3. Cost-Optimized Models and Performance Trade-Offs

As organizations deploy AI at scale, cost and performance become strategic constraints. Choosing the most powerful model may be economically unsustainable, while prioritizing low cost alone may limit system utility. This section explains what drives cost and how to think about trade-offs in real deployments.

Analogy (from module content)

A premium minivan is great for driving your kids to school, but absurd if you must get every kid in a town to school. At scale, you choose buses, bike convoys, or walking groups. Similarly, “best” AI is not always the most powerful model—it’s the best fit for the task under constraints.

Figure 3.1. What creates cost in AI systems (stack view)
1) Equipment (Compute Infrastructure) GPUs, servers, cloud costs, memory + processing requirements 2) System (Model choice + Tokens) Usage-based pricing: input tokens + output tokens 3) Energy (Electricity & Environmental cost) Data center power draw scales with volume and latency requirements
Original diagram created for this chapter (no external source).

3.1 Tokens and usage-based cost

The module highlights tokens as a major cost driver for LLM usage. Tokens are units of text (often ~3–4 characters). Both input and output are measured in tokens and priced accordingly (OpenAI, 2023, as cited in the module content).

Editorial addition (math check)

The module provides an example interaction with 500 input tokens and 1,000 output tokens at $0.03/1k input and $0.06/1k output. The exact cost is: (0.5 × $0.03) + (1.0 × $0.06) = $0.015 + $0.06 = $0.075 per interaction. This corrects arithmetic only; the strategic point (cost compounding at scale) remains the same.

Figure 3.2. Token accounting (input + output)
Your prompt Input tokens (priced per 1k) Model inference Compute + time + context handling Model response Output tokens (priced per 1k)
Original diagram created for this chapter (no external source).
Network/server hardware photo
Figure 3.3. Infrastructure is part of the cost story (servers, networking, data centers). Photo link embedded.

3.2 Performance trade-offs: choosing wisely

The module frames deployment as trade-offs across:

  • Accuracy vs. cost — Higher-end models may be more accurate but far more expensive.
  • Speed vs. power — Larger models can be slower; latency matters for real-time interactions.
  • Context length vs. efficiency — Longer context can help but is not always necessary; some models handle long context efficiently.
Decision checklist (from module content)
  • What is the minimum level of accuracy required?
  • How often will the model be used (volume)?
  • Can we use caching or cheaper models for certain tasks?
  • Is real-time performance important?
  • Does the model need long context, or can prompts be shorter?

4. Exploring Multimedia and Language Interaction Models

Audio and language interaction can sound simple, but they require complex technical architectures. This section distinguishes core components (ASR, TTS, NLU, voice biometrics), common combinations across industries, and operational cost drivers such as real-time constraints and customization.

4.1 Core definitions (audio)

TechnologyDefinitionCommon uses
ASR (Automatic Speech Recognition)Converts spoken language into textTranscription, captioning, command processing
TTS (Text-to-Speech)Converts text into natural-sounding speechVoice assistants, voiceovers, news readers
NLU (Natural Language Understanding)Determines intent and context from languageVoice-based customer service, conversational agents
Voice biometricsUses unique voice characteristics for authenticationFintech, healthcare, high-security environments
Figure 4.1. A typical voice-agent pipeline
Audio in (user speech) ASR speech → text NLU intent + context TTS text → speech Optional layer throughout: voice biometrics (authentication) + analytics (sentiment / QA / monitoring)
Original diagram created for this chapter (no external source).
Studio microphone photo
Figure 4.2. Audio interaction models depend on both capture (inputs) and high-quality synthesis (outputs). Photo link embedded.

4.2 Combinations of technologies across sectors

SectorUse caseTypical stack (as provided)
HealthcareDictation, transcription, patient interactionWhisper + NLP layer (HIPAA compliance needed) (Paubox, 2025)
RetailVoice-based customer service kiosksTTS + ASR + chatbot NLU (PYMNTS, 2024)
EducationLanguage learning, accessibility, lecturesTTS (multilingual) + voice grading (Wood et al., 2018)
FinanceCall center automation, sentiment analysisASR + NLU + analytics (Grace, 2025)
AutomotiveIn-car voice assistantsEdge-optimized ASR + embedded NLU (EE Times, 2025)

4.3 Logistics and cost drivers

  • Real-time vs. batch processing: Real-time systems (often under 1 second latency) usually cost more than batch processing (GeeksforGeeks, 2024).
  • Customization: Brand voices and robust comprehension across accents/dialects may require significant training and investment (Dialzara, 2024).
  • Language support: High-resource languages have richer model ecosystems; low-resource languages may require specialized training and cost.
  • Data privacy and compliance: Regulatory obligations may include GDPR, CCPA, HIPAA, and LGPD; violations can create financial and reputational risk.

4.4 Privacy risk areas and mitigations (as provided)

RiskExampleMitigation
Unconsented recordingRecording user voices without notificationExplicit consent prompts and audio cues
Data retentionStoring audio indefinitelyStrict retention policies; allow deletion
Biometric misuseUsing voiceprints without explicit consentRequire opt-in for voice biometrics
Third-party leakageSending user data to cloud APIs unsafelyStrong contracts (DPAs) or on-prem storage
Cross-border transferUsing U.S. servers for EU usersComply with international transfer agreements (SCCs, DPF)

5. Advanced Applications of Generative AI Tools

This section shifts from broad technology categories to practical toolsets: image generation, audio generation, text generation, and video generation. It also explains why hybrid architectures (GANs + diffusion + transformers) are common in real products and why agentic AI is positioned as the next wave.

5.1 A generative AI toolbox (use cases)

CapabilityUse cases (as provided)
Image generationCampaign visuals; product mockups; concept art; training illustrations; infographics
Audio generationVoice agents/IVR; accessibility; language learning; audiobooks/podcast narration
Text generationEmail drafting; chatbot scripts/FAQs; reports and summarization; documentation and SOPs; SEO content
Video generationShort-form ads; explainer videos; video lessons; concept trailers and storyboards

5.2 Image generation: business domains (as provided)

  • Advertising & Marketing: rapid creative production for A/B tests; tailored visuals across demographics (DataFeedWatch, 2025)
  • Entertainment: concept art, character design, backgrounds; faster prototyping
  • Retail & e-commerce: product mockups; virtual try-ons; visual merchandising
  • Architecture & design: rapid 3D sketches and design variations
  • Healthcare: imaging enhancement and synthetic training data
  • Education: custom illustrations and visual explanations

5.3 Underlying technologies: GANs, diffusion, transformers

Figure 5.1. Three building blocks of image generation (concept comparison)
GANs Adversarial training • Generator makes “fakes” • Discriminator detects • Fast, realistic outputs • Risk: mode collapse (variety narrows) Diffusion models Denoise from noise • Start with random noise • Refine step-by-step • Stable training + diversity • Quirk: local errors (hands/fingers, etc.) Transformers Self-attention • Global context handling • Strong coherence/control • Scales via parallelism • Backbone for LLMs and multimodal systems
Original diagram created for this chapter (no external source).

The module emphasizes that these systems are increasingly used in combination: GANs for speed/realism, diffusion for diversity/stability, and transformers for coherence and control—sometimes within the same application.

5.4 Workflows and bottom lines (as provided)

LevelDescriptionExample toolsBest forTrade-offs
Off-the-shelf APIs Hosted models via API; no setup; pay-per-use DALL·E 3 (OpenAI API), DreamStudio, Adobe Firefly Quick prototypes, marketing images, general needs Limited fine-tuning; pay per call; possible data lock-in
Open-source local models Install models on your own servers/private cloud Stable Diffusion (base/XL), HuggingFace Diffusers More control, privacy, brand consistency Setup + compute cost; requires in-house expertise
Custom fine-tuned models Train a model on proprietary style/data DreamBooth, LoRA, custom SD forks High-volume brand-specific content Expensive training + ongoing maintenance

6. Advanced Applications of Generative AI Tools

Thus far in this module, we have explored the foundational principles of AI, generative AI, and agentic AI—tracing the field from its early origins to its current, market-disrupting capabilities. We have also examined how written, audio, and visual technologies each contribute to the evolving AI landscape.

This final section shifts from broad technology categories to a more practical view: the specific tools and platforms that organizations can use to integrate AI functionality into real workflows and improve performance.

Figure 6.1. A practical “toolbox” view of generative AI capabilities
Text Generate / summarize Emails, FAQs, reports Docs, SOPs, KM Marketing copy Audio ASR / TTS / NLU Voice agents, IVR Accessibility Narration/voiceovers Images Create visuals at scale Ads, mockups Design variations Training visuals Video Generate motion content Explainers, ads Lessons, summaries Concept trailers
Original diagram created for this chapter (no external source).

6.1 A Generative AI Toolbox for Better Organizational Performance

The module categorizes widely used tools by the type of content they generate—images, audio, text, and video. This “toolbox” view helps leaders translate AI capabilities into concrete business applications.

6.2 Image Generation

Definition (from module framing): AI tools that generate images from text prompts or other inputs, enabling rapid content creation at scale.

Use cases (from module content):

  • Marketing: campaign visuals, ad creatives, social media graphics
  • Product design: mockups, variations, concept exploration
  • Concept art: storyboarding, pre-visualization, style prototyping
  • Education: illustrations for training materials, infographics, learning visuals

6.3 Audio Generation

Use cases (from module content):

  • Customer service: voice agents, IVR systems, call center automation
  • Accessibility: screen readers, audio descriptions, voiceovers
  • Language learning: pronunciation training, interactive speaking exercises
  • Content creation: audiobooks, podcast narration, AI newsreaders

6.4 Text Generation

Use cases (from module content):

  • Customer engagement: email drafting, chatbot scripts, FAQs
  • Internal tools: report writing, summarization, document generation
  • Content marketing: blog posts, social media captions, SEO content
  • Knowledge management: documentation, SOPs, help centers

6.5 Video Generation

Use cases (from module content):

  • Marketing: short-form ads, explainer videos, social content
  • Education: video lessons, animated summaries, course materials
  • Entertainment: concept trailers, character animation, storyboards
6.6 The Coming Wave of Agentic AI (module bridge)

Generative AI is already reshaping how organizations create text, audio, and visual content. The next frontier is agentic AI—systems that do not just respond to commands but take initiative, make decisions, and coordinate across tools autonomously. In the next module, the course explores how this evolution opens new possibilities for automation, personalization, and digital intelligence at scale.

Review questions and practice (editorial)

Editorial addition (study support)

The following questions are designed to help learners consolidate Module 1 concepts. They are not presented as original course content.

  1. Evolution: Compare rule-based chatbots with LLM-based chatbots. What changes (capability, risk, cost) when systems become generative?
  2. Reasoning: What is chain-of-thought prompting, and why does it matter for multi-step decision problems?
  3. Risk: Define hallucination in your own words. What governance practices reduce the risk in customer-facing systems?
  4. Cost: Identify three cost drivers for AI systems and give one “real-world” implication for each.
  5. Multimedia: In a voice agent, where do ASR, NLU, and TTS sit—and why does real-time latency change the cost?
  6. Deployment: When would you choose an off-the-shelf API versus an open-source local model versus fine-tuning?

Assignment 1 support (editorial)

Assignment listed in Course Guide: “Assignment 1: Evaluating the Cost of AI Systems.”

Editorial addition (suggested approach)

Since only the assignment title is provided, the following is a suggested template to help students apply Section 3’s concepts without inventing course requirements. Adapt as needed to match your facilitator’s instructions.

  1. Define the use case: e.g., customer support, document summarization, voice agent triage.
  2. Estimate volume: interactions/day and average prompt/response length (tokens).
  3. Compute token cost: input + output at the model’s rates; show best/expected/worst cases.
  4. Add constraints: latency targets, accuracy needs, context-length needs.
  5. Choose a model strategy: premium model only where needed; cheaper model for routine tasks; caching where possible.
  6. Discuss energy + governance: note operational sustainability and hallucination risk/mitigation.

References (as provided)

This list preserves the references exactly as they appear in the provided Module 1 content. Full bibliographic details can be added if you provide them.

  • Codingscape (2024)
  • Coshow et al. (2025)
  • DataFeedWatch (2025)
  • Dialzara (2024)
  • EE Times (2025)
  • GeeksforGeeks (2024)
  • Grace (2025)
  • Marr (2024)
  • Murphy (2023)
  • OpenAI (2023)
  • Patrizio (2025)
  • Paul & Tong (2024)
  • Paubox (2025)
  • PYMNTS (2024)
  • Topal (2023)
  • Williams (2025)
  • Wood et al. (2018)