Module 1 — Foundations of Generative and Agentic AI

Learning objectives and key terms

Module objectives (from the Course Guide):

Understand the evolution and landscape of generative AI
Familiarize yourself with the terminology and categories of AI models
Recognize the strategic value of different AI functionalities (e.g., chatbots, reasoning, and multimedia)

Key terms (from this module’s content):

ELIZA; rule-based chatbot; keyword detection; natural language processing (NLP); machine learning (ML); conversational agent; transformer; large language model (LLM); generative AI; reasoning model; chain-of-thought prompting; hallucination; tokens; latency; context length; ASR; TTS; NLU; voice biometrics; GAN; diffusion model; self-attention; multimodal models; fine-tuning; DreamBooth; LoRA; APIs; compliance (GDPR/CCPA/HIPAA/LGPD).

How to read this chapter

This module is written for leaders who need “working fluency”—enough depth to make responsible strategic decisions, ask better questions, and evaluate trade-offs without becoming a specialist.

1. Generative AI Fundamentals (orientation)

This module positions generative AI as a practical capability—systems that can produce text, images, audio, and more— and links those capabilities to organizational performance and digital transformation. The later sections unpack how we arrived here (chatbots), what makes AI expensive (cost drivers), and how new modalities (audio and images) reshape workflows.

Editorial addition (organization)

The Course Guide lists “Generative AI Fundamentals” as Section 1 of Module 1. The provided Module 1 text begins with chatbot evolution and then expands into cost, multimedia, and advanced tools. This short orientation connects the given content to the official outline.

2. AI Chatbots: Past, Present, and Future

2.1 From ELIZA to modern generative chatbots

Chatbot capabilities have evolved significantly since MIT professor Joseph Weizenbaum created ELIZA, the world’s first chatbot, in 1966. Early systems were rule-based: they detected keywords and returned pre-scripted responses. These systems lacked NLP capabilities and were limited in scope and output (Murphy, 2023).

Figure 2.1. A sample ELIZA-style conversation (historical screenshot). Source: Wikimedia Commons link embedded.

Figure 2.2. Evolution of chatbot capabilities (concept map)

This figure is an original diagram created for this textbook chapter (no external source).

2.2 Conversational agents and the generative shift

In the early 2010s, machine learning (ML) advancements enabled a new generation of chatbots—conversational agents— that better understood natural language and could complete more complex tasks (Murphy, 2023). Examples include IBM Watson, Siri, and Alexa. Developments in the late 2010s—transformer-based neural networks and large language models (LLMs)—paved the way for generative AI chatbots that can handle larger query volumes and deliver more personalized, natural-sounding responses (Marr, 2024).

Figure 2.3. IBM Watson prototype (photo). Source: Wikimedia Commons link embedded.

2.3 Reasoning models, chain-of-thought, and limitations

Reasoning models (e.g., OpenAI’s o3 and o4 models) represent a more recent milestone. These models are trained to spend more time processing queries, “thinking through” problems before responding, like a human analyst would (Williams, 2025). They have demonstrated improvements on tasks requiring complex reasoning in areas such as science, coding, and math (Paul & Tong, 2024).

Definition — Chain-of-thought prompting

Chain-of-thought prompting is designed to improve the ability of LLMs to perform complex reasoning. It involves generating intermediate natural language reasoning steps that lead to a final answer, simulating a human-like thought process.

Example (from module content): For “Market A vs. Market B,” a chain-of-thought-enabled model would analyze factors separately—market size, competition, regulatory environment—before recommending a direction.

Risk note — hallucination

The module notes that hallucination remains an inherent risk of LLMs: models may generate outputs not grounded in training data or recognized patterns, producing false or inaccurate claims. The module also notes an OpenAI study in which o4-mini hallucinated more than earlier ChatGPT models on certain metrics.

2.4 Agentic AI (bridge to Module 2)

Unlike traditional chatbots that primarily react to prompts, agentic AI can take action autonomously and proactively, adapt to context, and execute goals in complex environments with minimal human intervention (Coshow et al., 2025). According to MIT’s Dr. Abel Sanchez, an AI agent is essentially a workflow with tasks that may involve humans.

Illustrative use cases (from module content):

Automate customer experiences
Create and post content for an advertising campaign
Provide proactive sales intelligence and recommend next steps (e.g., upselling)
Enable security systems that monitor, report, and act on their own initiative
Automate supply chains and planning

Strategic applications: transforming the customer experience

The module argues that customer-facing chatbots now handle more queries with higher accuracy and nuance, and can offer increasingly personalized responses based on customer data and prior interactions (Marr, 2024). It also suggests that AI customer-experience agents may allow organizations to automate a significant share of customer interactions while boosting engagement (Coshow et al., 2025).

Case study — Klarna (as provided)

In 2024, Klarna adopted an AI customer service assistant powered by OpenAI. The chatbot reportedly handled a workload equivalent to 700 full-time agents in its first month. Repeat inquiries fell by 25%, and the average service time was two minutes versus 11 minutes with human agents.

Case study — Octopus Energy (as provided)

Octopus Energy integrated ChatGPT into its customer service channel and assigned responsibility for handling inquiries. According to the company, the system handles the work of 250 people and has received higher average customer satisfaction ratings than human agents.

3. Cost-Optimized Models and Performance Trade-Offs

As organizations deploy AI at scale, cost and performance become strategic constraints. Choosing the most powerful model may be economically unsustainable, while prioritizing low cost alone may limit system utility. This section explains what drives cost and how to think about trade-offs in real deployments.

Analogy (from module content)

A premium minivan is great for driving your kids to school, but absurd if you must get every kid in a town to school. At scale, you choose buses, bike convoys, or walking groups. Similarly, “best” AI is not always the most powerful model—it’s the best fit for the task under constraints.

Figure 3.1. What creates cost in AI systems (stack view)

Original diagram created for this chapter (no external source).

3.1 Tokens and usage-based cost

The module highlights tokens as a major cost driver for LLM usage. Tokens are units of text (often ~3–4 characters). Both input and output are measured in tokens and priced accordingly (OpenAI, 2023, as cited in the module content).

Editorial addition (math check)

The module provides an example interaction with 500 input tokens and 1,000 output tokens at $0.03/1k input and $0.06/1k output. The exact cost is: (0.5 × $0.03) + (1.0 × $0.06) = $0.015 + $0.06 = $0.075 per interaction. This corrects arithmetic only; the strategic point (cost compounding at scale) remains the same.

Figure 3.2. Token accounting (input + output)

Original diagram created for this chapter (no external source).

Figure 3.3. Infrastructure is part of the cost story (servers, networking, data centers). Photo link embedded.

3.2 Performance trade-offs: choosing wisely

The module frames deployment as trade-offs across:

Accuracy vs. cost — Higher-end models may be more accurate but far more expensive.
Speed vs. power — Larger models can be slower; latency matters for real-time interactions.
Context length vs. efficiency — Longer context can help but is not always necessary; some models handle long context efficiently.

Decision checklist (from module content)

What is the minimum level of accuracy required?
How often will the model be used (volume)?
Can we use caching or cheaper models for certain tasks?
Is real-time performance important?
Does the model need long context, or can prompts be shorter?

4. Exploring Multimedia and Language Interaction Models

Audio and language interaction can sound simple, but they require complex technical architectures. This section distinguishes core components (ASR, TTS, NLU, voice biometrics), common combinations across industries, and operational cost drivers such as real-time constraints and customization.

4.1 Core definitions (audio)

Technology	Definition	Common uses
ASR (Automatic Speech Recognition)	Converts spoken language into text	Transcription, captioning, command processing
TTS (Text-to-Speech)	Converts text into natural-sounding speech	Voice assistants, voiceovers, news readers
NLU (Natural Language Understanding)	Determines intent and context from language	Voice-based customer service, conversational agents
Voice biometrics	Uses unique voice characteristics for authentication	Fintech, healthcare, high-security environments

Figure 4.1. A typical voice-agent pipeline

Original diagram created for this chapter (no external source).

Figure 4.2. Audio interaction models depend on both capture (inputs) and high-quality synthesis (outputs). Photo link embedded.

4.2 Combinations of technologies across sectors

Sector	Use case	Typical stack (as provided)
Healthcare	Dictation, transcription, patient interaction	Whisper + NLP layer (HIPAA compliance needed) (Paubox, 2025)
Retail	Voice-based customer service kiosks	TTS + ASR + chatbot NLU (PYMNTS, 2024)
Education	Language learning, accessibility, lectures	TTS (multilingual) + voice grading (Wood et al., 2018)
Finance	Call center automation, sentiment analysis	ASR + NLU + analytics (Grace, 2025)
Automotive	In-car voice assistants	Edge-optimized ASR + embedded NLU (EE Times, 2025)

4.3 Logistics and cost drivers

Real-time vs. batch processing: Real-time systems (often under 1 second latency) usually cost more than batch processing (GeeksforGeeks, 2024).
Customization: Brand voices and robust comprehension across accents/dialects may require significant training and investment (Dialzara, 2024).
Language support: High-resource languages have richer model ecosystems; low-resource languages may require specialized training and cost.
Data privacy and compliance: Regulatory obligations may include GDPR, CCPA, HIPAA, and LGPD; violations can create financial and reputational risk.

4.4 Privacy risk areas and mitigations (as provided)

Risk	Example	Mitigation
Unconsented recording	Recording user voices without notification	Explicit consent prompts and audio cues
Data retention	Storing audio indefinitely	Strict retention policies; allow deletion
Biometric misuse	Using voiceprints without explicit consent	Require opt-in for voice biometrics
Third-party leakage	Sending user data to cloud APIs unsafely	Strong contracts (DPAs) or on-prem storage
Cross-border transfer	Using U.S. servers for EU users	Comply with international transfer agreements (SCCs, DPF)

5. Advanced Applications of Generative AI Tools

This section shifts from broad technology categories to practical toolsets: image generation, audio generation, text generation, and video generation. It also explains why hybrid architectures (GANs + diffusion + transformers) are common in real products and why agentic AI is positioned as the next wave.

5.1 A generative AI toolbox (use cases)

Capability	Use cases (as provided)
Image generation	Campaign visuals; product mockups; concept art; training illustrations; infographics
Audio generation	Voice agents/IVR; accessibility; language learning; audiobooks/podcast narration
Text generation	Email drafting; chatbot scripts/FAQs; reports and summarization; documentation and SOPs; SEO content
Video generation	Short-form ads; explainer videos; video lessons; concept trailers and storyboards

5.2 Image generation: business domains (as provided)

Advertising & Marketing: rapid creative production for A/B tests; tailored visuals across demographics (DataFeedWatch, 2025)
Entertainment: concept art, character design, backgrounds; faster prototyping
Retail & e-commerce: product mockups; virtual try-ons; visual merchandising
Architecture & design: rapid 3D sketches and design variations
Healthcare: imaging enhancement and synthetic training data
Education: custom illustrations and visual explanations

5.3 Underlying technologies: GANs, diffusion, transformers

Figure 5.1. Three building blocks of image generation (concept comparison)

Original diagram created for this chapter (no external source).

The module emphasizes that these systems are increasingly used in combination: GANs for speed/realism, diffusion for diversity/stability, and transformers for coherence and control—sometimes within the same application.

5.4 Workflows and bottom lines (as provided)

Level	Description	Example tools	Best for	Trade-offs
Off-the-shelf APIs	Hosted models via API; no setup; pay-per-use	DALL·E 3 (OpenAI API), DreamStudio, Adobe Firefly	Quick prototypes, marketing images, general needs	Limited fine-tuning; pay per call; possible data lock-in
Open-source local models	Install models on your own servers/private cloud	Stable Diffusion (base/XL), HuggingFace Diffusers	More control, privacy, brand consistency	Setup + compute cost; requires in-house expertise
Custom fine-tuned models	Train a model on proprietary style/data	DreamBooth, LoRA, custom SD forks	High-volume brand-specific content	Expensive training + ongoing maintenance

6. Advanced Applications of Generative AI Tools

Thus far in this module, we have explored the foundational principles of AI, generative AI, and agentic AI—tracing the field from its early origins to its current, market-disrupting capabilities. We have also examined how written, audio, and visual technologies each contribute to the evolving AI landscape.

This final section shifts from broad technology categories to a more practical view: the specific tools and platforms that organizations can use to integrate AI functionality into real workflows and improve performance.

Figure 6.1. A practical “toolbox” view of generative AI capabilities

Original diagram created for this chapter (no external source).

6.1 A Generative AI Toolbox for Better Organizational Performance

The module categorizes widely used tools by the type of content they generate—images, audio, text, and video. This “toolbox” view helps leaders translate AI capabilities into concrete business applications.

6.2 Image Generation

Definition (from module framing): AI tools that generate images from text prompts or other inputs, enabling rapid content creation at scale.

Use cases (from module content):

Marketing: campaign visuals, ad creatives, social media graphics
Product design: mockups, variations, concept exploration
Concept art: storyboarding, pre-visualization, style prototyping
Education: illustrations for training materials, infographics, learning visuals

6.3 Audio Generation

Use cases (from module content):

Customer service: voice agents, IVR systems, call center automation
Accessibility: screen readers, audio descriptions, voiceovers
Language learning: pronunciation training, interactive speaking exercises
Content creation: audiobooks, podcast narration, AI newsreaders

6.4 Text Generation

Use cases (from module content):

Customer engagement: email drafting, chatbot scripts, FAQs
Internal tools: report writing, summarization, document generation
Content marketing: blog posts, social media captions, SEO content
Knowledge management: documentation, SOPs, help centers

6.5 Video Generation

Use cases (from module content):

Marketing: short-form ads, explainer videos, social content
Education: video lessons, animated summaries, course materials
Entertainment: concept trailers, character animation, storyboards

6.6 The Coming Wave of Agentic AI (module bridge)

Generative AI is already reshaping how organizations create text, audio, and visual content. The next frontier is agentic AI—systems that do not just respond to commands but take initiative, make decisions, and coordinate across tools autonomously. In the next module, the course explores how this evolution opens new possibilities for automation, personalization, and digital intelligence at scale.

Review questions and practice (editorial)

Editorial addition (study support)

The following questions are designed to help learners consolidate Module 1 concepts. They are not presented as original course content.

Evolution: Compare rule-based chatbots with LLM-based chatbots. What changes (capability, risk, cost) when systems become generative?
Reasoning: What is chain-of-thought prompting, and why does it matter for multi-step decision problems?
Risk: Define hallucination in your own words. What governance practices reduce the risk in customer-facing systems?
Cost: Identify three cost drivers for AI systems and give one “real-world” implication for each.
Multimedia: In a voice agent, where do ASR, NLU, and TTS sit—and why does real-time latency change the cost?
Deployment: When would you choose an off-the-shelf API versus an open-source local model versus fine-tuning?

Assignment 1 support (editorial)

Assignment listed in Course Guide: “Assignment 1: Evaluating the Cost of AI Systems.”

Editorial addition (suggested approach)

Since only the assignment title is provided, the following is a suggested template to help students apply Section 3’s concepts without inventing course requirements. Adapt as needed to match your facilitator’s instructions.

Define the use case: e.g., customer support, document summarization, voice agent triage.
Estimate volume: interactions/day and average prompt/response length (tokens).
Compute token cost: input + output at the model’s rates; show best/expected/worst cases.
Add constraints: latency targets, accuracy needs, context-length needs.
Choose a model strategy: premium model only where needed; cheaper model for routine tasks; caching where possible.
Discuss energy + governance: note operational sustainability and hallucination risk/mitigation.

References (as provided)

This list preserves the references exactly as they appear in the provided Module 1 content. Full bibliographic details can be added if you provide them.

Codingscape (2024)
Coshow et al. (2025)
DataFeedWatch (2025)
Dialzara (2024)
EE Times (2025)
GeeksforGeeks (2024)
Grace (2025)
Marr (2024)
Murphy (2023)
OpenAI (2023)
Patrizio (2025)
Paul & Tong (2024)
Paubox (2025)
PYMNTS (2024)
Topal (2023)
Williams (2025)
Wood et al. (2018)