This book offers a hands-on introduction to software engineering (and other related topics!) with generative AI, combining formal foundations with easily executable code, practical design patterns and best practices.
Large Language Models (LLMs) and generative AI have fundamentally changed what it means to build software. We’ve moved from telling computers exactly what to do, step by step, to showing them examples and watching them learn patterns we never explicitly programmed. It’s a bit like the difference between following a recipe and learning to cook by watching someone who hopefully knows what they’re doing.
What’s also very exciting is that don’t need a “supercomputer” anymore to get all the best benefits from this. The combination of open-source tools like Ollama, surprisingly capable small models, and the everyday laptop sitting on your desk has democratized generative AI in a way that would have seemed impossible just a few years ago. GenAI isn’t a luxury reserved for big tech companies with massive GPU clusters—it’s a programmable resource available to every developer, student, professor and curious tinkerer who wants to experiment.
This book is a hands-on, example-driven journey through practical AI-augmented software engineering. We’ll explore the core concepts you actually need: prompting or “prompt engineering” (the art of talking to models effectively), retrieval-augmented generation (teaching models to reference external knowledge), agent architectures (systems that can plan and execute multi-step tasks), and code generation/review that actually works. But here’s the key—every concept comes with executable Python examples that you can run, modify, and learn from. No hand-waving, and nothing “left as an exercise for the reader.”
The book discusses the mechanics of how code-specific models work, why RAG systems help ground AI in reality rather than hallucinations, and how to design agents that can tackle complex programming tasks without getting lost. You’ll understand not just what these systems do, but how they do it and why certain approaches work better than others.
This material is written for teachers who want to bring AI into their curriculum, students building mental models of these strange new tools, and software engineers who want to integrate GenAI into their workflow. The goal is to make AI concrete and comprehensible, so you can use it productively and know when to trust it (and when not to).
Notes on Book Software¶
To aid reproducibility, I show the Python version — and versions of the major Python packages — used to build this executable book.
from platform import python_version
print(python_version())3.13.7
This book makes significant use of Ollama and its Python API. Ollama is an open source tool that simplifies managing and running LLMs locally, like on a PC or local server, for example. Ollama executes models entirely on the user’s machine, providing full control over model selection, configuration, and execution. This local-first design makes Ollama especially valuable in educational settings, where transparency, reproducibility, and hands-on experimentation are essential.
When using Ollama, students can work with modern open-weight models—such as general-purpose language models, code-focused models, and embedding models—without relying on external APIs or internet connectivity. This enables cost-free experimentation while preserving privacy and allowing direct observation of system behavior, including token usage, context limits, and output variability. Ollama’s simple command-line interface and API support interactive exploration and programmatic integration into software systems.
In the college-level courses where this book is used, Ollama serves as a laboratory environment for studying generative AI as an engineered system rather than a black box. Working through guided labs, students explore core concepts such as prompt structure, sampling parameters, model tradeoffs, and retrieval-augmented generation. When they run models locally and embed them into their own applications, students begin to develop practical intuition about how large language models work and how they fail, their limitations and how they can be responsibly and effectively used in real production software systems.
!uv pip show ollamaName: ollama
Version: 0.6.0
Location: /Users/chikeabuah/Desktop/programming-genai/.venv/lib/python3.13/site-packages
Requires: httpx, pydantic
Required-by:
The structure and executable nature of this book is enabled by the Jupyter ecosystem.
!jupyter --versionSelected Jupyter core packages...
IPython : 9.6.0
ipykernel : 7.1.0
ipywidgets : 8.1.7
jupyter_client : 8.6.3
jupyter_core : 5.9.1
jupyter_server : 2.17.0
jupyterlab : 4.4.10
nbclient : 0.10.2
nbconvert : 7.16.6
nbformat : 5.10.4
notebook : 7.4.7
qtconsole : not installed
traitlets : 5.14.3
This book is typset using Jupyter Book 2.0 which was in alpha release when I started writing this book.
Jupyter Book is a great resource for sharing knowledge, and I am a big fan.
!jupyter book --versionv2.0.0-b3
!jupyter kernel --version8.6.3
I can personally recommend the Microsoft Visual Studio Code editor for interacting with Jupyter Notebooks. It is free to use, and was used in the writing of this book (and its predecessor).
!code --version 1.105.1
7d842fb85a0275a4a8e4d7e040d2625abbf7f084
arm64
The Co-Instructors: Our Models¶
Not to get too anthropomorphic right off the bat, but to create an ideal educational experience we will need the perfect teaching assistants.
Throughout this book we will make use of some of the most popular open-source GenAI models in the world at the time of initial writing (early 2026). No one model suits all purposes, and we will utilize about 6 or 7 different models that are uniquely specialized in some particular purpose like coding, vision, creative writing, creating embeddings, etc.
!ollama listNAME ID SIZE MODIFIED
nomic-embed-text:latest 0a109f422b47 274 MB 4 weeks ago
llama3.1:latest 46e0c10c039e 4.9 GB 6 weeks ago
llama3.2:latest a80c4f17acd5 2.0 GB 6 weeks ago
qwen3:latest 500a1f067a9f 5.2 GB 6 weeks ago
qwen2.5-coder:latest dae161e27b0e 4.7 GB 6 weeks ago
deepseek-r1:latest 6995872bfe4c 5.2 GB 6 weeks ago
You can see above that I tend to utilize the default or latest category of model sizes, which is usually around the ~7b parameter size. I used the same model sizes writing this book on my Macbook M4 32GB (when i say GB in this book I mean gigabyte by default) machine as my students used doing the labs on a shared Tesla P4 8GB gpu on the WWU CS LAN.
I will briefly discuss each model in the following sections. I also show the download count at the time to emphasize each model’s popularity in the open-source community.
DeepSeek-R1 (75M Downloads)¶
DeepSeek-R1 is best understood as a reasoning-first model that prioritizes structured, multi-step thinking over speed or stylistic fluency. It performs especially well on mathematics, logic, and problems that benefit from explicit intermediate reasoning, often matching or exceeding larger generalist models on reasoning benchmarks.
This comes at the cost of verbosity and latency: the model tends to produce longer responses and can feel slow or inefficient for simple queries. Its strengths make it well suited for research, theorem-style problem solving, and analytical tasks, but it is less compelling for casual dialogue, creative writing, or lightweight instruction following where faster, more concise models feel smoother.
In my experience this model probably has less guardrails than the others mentioned here, and will speak it’s “mind” on most topics. Definitely a great candidate for complex strategy, analysis, and ethics experiments.
!ollama show deepseek-r1 Model
architecture qwen3
parameters 8.2B
context length 131072
embedding length 4096
quantization Q4_K_M
Capabilities
completion
thinking
Parameters
stop "<|begin▁of▁sentence|>"
stop "<|end▁of▁sentence|>"
stop "<|User|>"
stop "<|Assistant|>"
temperature 0.6
top_p 0.95
License
MIT License
Copyright (c) 2023 DeepSeek
...
Qwen2.5-Coder (9.4M Downloads)¶
Qwen2.5-Coder is a highly specialized model optimized almost entirely for programming tasks. It excels at code generation, refactoring, and debugging (code repair), often producing correct solutions on the first attempt and ranking near the top of open-source coding benchmarks. Its outputs tend to be structured, syntactically clean, and aligned with common software engineering patterns.
However, this specialization also limits its versatility: outside of programming contexts, its reasoning, conversational quality, and creative abilities are noticeably weaker than strong generalist models. Qwen2.5-Coder is most effective when used as a dedicated coding assistant rather than a general AI system. This model is a good candidate to have as the default for a software engineering class. It is much more efficient in size at 5GB for the 7b (7 billion parameter version) than Qwen3-Coder which requires at least 19GB.
!ollama show qwen2.5-coder Model
architecture qwen2
parameters 7.6B
context length 32768
embedding length 3584
quantization Q4_K_M
Capabilities
completion
tools
insert
System
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
License
Apache License
Version 2.0, January 2004
...
Qwen3 (15.8M Downloads)¶
Qwen3 is a strong general-purpose open model designed to balance reasoning ability, coding skill, multilingual coverage, tool-calling, and efficiency. It performs well across a wide range of benchmarks and supports different operating modes that trade off speed for deeper reasoning, making it flexible for both interactive use and harder analytical tasks. Its multilingual capabilities are particularly strong, and larger variants benefit from mixture-of-experts designs that improve efficiency without sacrificing quality. While generally competitive with the best open models, Qwen3 can occasionally struggle with strict instruction adherence or simple factual queries compared to more conservatively tuned models. Overall, it stands out as one of the most well-rounded open models available.
!ollama show qwen3 Model
architecture qwen3
parameters 8.2B
context length 40960
embedding length 4096
quantization Q4_K_M
Capabilities
completion
tools
thinking
Parameters
repeat_penalty 1
stop "<|im_start|>"
stop "<|im_end|>"
temperature 0.6
top_k 20
top_p 0.95
License
Apache License
Version 2.0, January 2004
...
Gemma3 (28.8M Downloads)¶
Gemma3 is a versatile, efficiency-oriented model family that is a good and creative writer in my experience. It supports text and vision inputs and performs reliably across many languages. The model strikes a balance between capability and hardware efficiency, with smaller variants that run comfortably on limited resources. While Gemma3 is competitive on many benchmarks, it typically falls slightly behind the very top reasoning or coding-focused models at comparable scales.
Its main appeal lies in its multimodal flexibility and strong all-around performance. I would recommend using this for classroom exercises that involve computer vision.
!ollama show gemma3 Model
architecture gemma3
parameters 4.3B
context length 131072
embedding length 2560
quantization Q4_K_M
Capabilities
completion
vision
Parameters
stop "<end_of_turn>"
temperature 1
top_k 64
top_p 0.95
License
Gemma Terms of Use
Last modified: February 21, 2024
...
Llama 3.2 (50.8M Downloads)¶
Llama 3.2 focuses on efficiency and is the smallest of our generalist models at 2GB at 3b parameters, which deserves special mention. The model is well tuned for conversational tasks, summarization, and general text understanding, delivering solid performance relative to its size. However, it does not match larger models on deep reasoning or advanced coding tasks. Llama 3.2 is best seen as a very practical and deployment-friendly model for prototyping non-coding projects.
!ollama show llama3.2 Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization Q4_K_M
Capabilities
completion
tools
Parameters
stop "<|start_header_id|>"
stop "<|end_header_id|>"
stop "<|eot_id|>"
License
LLAMA 3.2 COMMUNITY LICENSE AGREEMENT
Llama 3.2 Version Release Date: September 25, 2024
...
Llama 3.1 (108M Downloads)¶
Llama 3.1 represents Meta’s larger-scale generalist offering and is defined primarily by its breadth and scalability. Available in sizes ranging from modest 5GB to extremely large 243GB, it delivers strong performance across reasoning, coding, summarization, and dialogue, especially in its largest variants. The model supports long contexts and advanced features such as tool and function calling, making it suitable for complex agent-style applications. Its main drawback is resource intensity: the most capable versions require substantial compute and memory, limiting accessibility.
Compared to Llama 3.2, it is less optimized for deployment but significantly stronger as a high-end, general-purpose language model.
!ollama show llama3.1 Model
architecture llama
parameters 8.0B
context length 131072
embedding length 4096
quantization Q4_K_M
Capabilities
completion
tools
Parameters
stop "<|start_header_id|>"
stop "<|end_header_id|>"
stop "<|eot_id|>"
License
LLAMA 3.1 COMMUNITY LICENSE AGREEMENT
Llama 3.1 Version Release Date: July 23, 2024
...
nomic-embed-text (48.5M Downloads)¶
nomic-text-embed is a text embedding model designed specifically for high-quality semantic representations rather than generative language tasks. Its primary strength lies in producing dense vector embeddings that capture meaning, similarity, and topical structure across documents, sentences, and short passages, making it well suited for classroom labs and projects dealing with retrieval-augmented generation (RAG), semantic search, clustering, and recommendation systems.
The model emphasizes strong performance on retrieval and similarity benchmarks while remaining efficient enough for large-scale indexing. Because it is not a generative model, it does not produce natural language responses and is instead intended to be used as an infrastructure component that feeds downstream systems such as LLMs or search pipelines.
Its appeal is in reliability, consistency, and alignment with modern vector-database workflows rather than conversational or reasoning capabilities.
!ollama show nomic-embed-text Model
architecture nomic-bert
parameters 137M
context length 2048
embedding length 768
quantization F16
Capabilities
embedding
Parameters
num_ctx 8192
License
Apache License
Version 2.0, January 2004
...