This book offers a hands-on introduction to software engineering (and other related topics!) with Generative AI, combining formal foundations with easily executable code, practical design patterns and best practices.
Large Language Models (LLMs) and Generative AI have fundamentally changed what it means to build software. We’ve moved from telling computers exactly what to do, step by step, to showing them examples and watching them learn patterns we never explicitly programmed. It’s a bit like the difference between following a recipe versus learning to cook by watching a pro chef who (hopefully!) knows what they’re doing.
What’s also very exciting is that you don’t need a “supercomputer” anymore to get all the best benefits from AI. The combination of: the general increase in efficiency of AI and ML technology, creation of open-source tools like Ollama, along with capable efficient (small) models - has democratized Generative AI in a way that would have seemed impossible just a few years ago. Now, GenAI isn’t a luxury reserved for big tech companies with massive GPU clusters — it’s a programmable resource which can be made available to every student, professor and curious tinkerer who wants to experiment.
This book is a hands-on, example-driven journey through practical AI-augmented software engineering. Together, we’ll explore the core concepts you actually need for success in tech: prompting or “prompt engineering” (the art of communicating with models effectively), retrieval-augmented generation (teaching models to reference external knowledge), agent architectures (systems that can plan and execute multi-step tasks), and code generation & review that actually works (based on empirical research). But here’s the key value add — every concept comes with carefully crafted executable Python examples that you can run, modify, directly learn from, and use to kickstart your own projects! No hand-waving, and nothing “left as an exercise for the reader” (okay maybe a few!). I also include a few of my best attempts at humor, because we all know software engineering is much better with a joke or two! I’ll do my best not to rely too much on zoomer slang to be funny.
The book discusses the mechanics of how code-specific models work (and differ or excel), why RAG systems help ground AI in reality rather than hallucinations, and how to design agents that can tackle complex programming tasks without getting lost. You’ll understand not just what these systems do, but how they do it and why certain approaches work better than others when going about these tasks.
This material is written especially for teachers who want to bring AI into their curriculums, students building mental models of AI tools, and software engineers who want to integrate GenAI into their workflow. The goal is to make AI concrete and comprehensible, so you can use it productively and know when to trust it (and when not to).
Notes on Book Software¶
To aid reproducibility and understanding, I show and discuss the (Python) software package versions used to build this executable book!
Python and Ollama¶
from platform import python_version
print(python_version())3.13.7
This book makes significant use of Ollama and its Python API. Ollama is a free and open source tool that simplifies managing and running LLMs locally. Ollama executes models entirely on the user’s machine, providing control over model selection, configuration, and execution. This makes Ollama especially valuable in educational settings, where transparency, reproducibility, and hands-on experimentation are essential.
When using Ollama, students can work with modern open-weight models — such as general-purpose language models, code-focused models, and embedding models, without relying on external APIs or internet connectivity. This enables cost-free experimentation while preserving privacy and allowing direct observation of system behavior, including token usage, context limits, and output variability.
In the college-level courses where this book is used, Ollama facilitates a laboratory environment for studying Generative AI as an transparent system rather than a black box. Working through guided labs, students explore core concepts such as prompt structure, sampling parameters, model tradeoffs, and retrieval-augmented generation. When they run models locally and embed them into their own applications, students begin to develop practical intuition about how large language models work and how they fail, their limitations and how they can be responsibly and effectively used in real production software systems.
!uv pip show ollamaName: ollama
Version: 0.6.0
Location: /Users/chikeabuah/Desktop/programming-genai/.venv/lib/python3.13/site-packages
Requires: httpx, pydantic
Required-by:
Jupyter¶
The structure and executable nature of this book is enabled by the Jupyter ecosystem.
!jupyter --versionSelected Jupyter core packages...
IPython : 9.6.0
ipykernel : 7.1.0
ipywidgets : 8.1.7
jupyter_client : 8.6.3
jupyter_core : 5.9.1
jupyter_server : 2.17.0
jupyterlab : 4.4.10
nbclient : 0.10.2
nbconvert : 7.16.6
nbformat : 5.10.4
notebook : 7.4.7
qtconsole : not installed
traitlets : 5.14.3
This book is typset using Jupyter Book 2.0 which was in alpha release when I started writing this book.
Jupyter Book is a great resource for sharing knowledge, and I am a big fan!
!jupyter book --versionv2.0.0-b3
!jupyter kernel --version8.6.3
I can personally recommend the Microsoft VS Code editor for interacting with Jupyter Notebooks. It is free to use, and was used in the writing of this book (and its predecessor).
!code --version 1.105.1
7d842fb85a0275a4a8e4d7e040d2625abbf7f084
arm64
Our “Co-Instructors”¶
Not to get too anthropomorphic right off the bat, but to create an ideal educational experience we will need the perfect co-instructors, or maybe teaching assistants if you will.
Throughout this book we will make use of some of the most popular open-source models in the world at the time of initial writing. No one model suits all purposes, and we will utilize about 6 or 7 different models that are uniquely specialized in some particular purpose like coding, vision, creative writing, vector embeddings, etc.
!ollama listNAME ID SIZE MODIFIED
nomic-embed-text:latest 0a109f422b47 274 MB 4 weeks ago
llama3.1:latest 46e0c10c039e 4.9 GB 6 weeks ago
llama3.2:latest a80c4f17acd5 2.0 GB 6 weeks ago
qwen3:latest 500a1f067a9f 5.2 GB 6 weeks ago
qwen2.5-coder:latest dae161e27b0e 4.7 GB 6 weeks ago
deepseek-r1:latest 6995872bfe4c 5.2 GB 6 weeks ago
I used the same model sizes writing this book on my Macbook M4 32GB (when i say GB in this book I mean gigabyte by default) machine as my students working on the course labs on a Tesla P4 8GB gpu on the WWU CS LAN.
I will briefly discuss each model in the following sections. I also show the download count at the time of writing this monograph, to demonstrate each model’s popularity in the open-source community.
DeepSeek-R1 (75M Downloads)¶
DeepSeek-R1 is best understood as a reasoning-first model that prioritizes structured, multi-step thinking over speed or stylistic fluency. It performs especially well on mathematics, logic, and problems that benefit from explicit intermediate reasoning, often matching or exceeding larger generalist models on reasoning benchmarks.
This comes at the cost of verbosity and latency: the model tends to produce longer responses and can feel slow or inefficient for simple queries. Its strengths make it well suited for research, theorem-style problem solving, and analytical tasks, but it is less compelling for casual dialogue, creative writing, or lightweight instruction.
In my experience this model probably has less guardrails than the others mentioned here, and will speak it’s “mind” on most topics. This makes it a great candidate for complex strategy, logical analysis, and ethics experiments!
!ollama show deepseek-r1 Model
architecture qwen3
parameters 8.2B
context length 131072
embedding length 4096
quantization Q4_K_M
Capabilities
completion
thinking
Parameters
stop "<|begin▁of▁sentence|>"
stop "<|end▁of▁sentence|>"
stop "<|User|>"
stop "<|Assistant|>"
temperature 0.6
top_p 0.95
License
MIT License
Copyright (c) 2023 DeepSeek
...
Qwen2.5-Coder (9.4M Downloads)¶
Qwen2.5-Coder is a highly specialized model optimized almost entirely for programming tasks. It excels at code generation, refactoring, and debugging (code repair), often producing correct solutions on the first attempt and ranking near the top of open-source coding benchmarks. Its outputs tend to be structured, syntactically clean, and aligned with common software engineering patterns.
However, this specialization also limits its versatility: outside of programming contexts, its reasoning, conversational quality, and creative abilities are noticeably weaker than strong generalist models. Qwen2.5-Coder is most effective when used as a dedicated coding assistant rather than a general AI system. This model is a good candidate to have as the default for a software engineering class. It is much more efficient in size at 5GB for the 7b (7 billion parameter version) than Qwen3-Coder which requires at least 19GB.
!ollama show qwen2.5-coder Model
architecture qwen2
parameters 7.6B
context length 32768
embedding length 3584
quantization Q4_K_M
Capabilities
completion
tools
insert
System
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.
License
Apache License
Version 2.0, January 2004
...
Qwen3 (15.8M Downloads)¶
Qwen3 is a strong general-purpose model designed to balance reasoning ability, coding skill and tool-calling. It performs well across a wide range of benchmarks and supports different operating modes that trade off speed for deeper reasoning, making it flexible for both interactive use and harder analytical tasks. While generally competitive with the best open models, Qwen3 can occasionally struggle with strict instruction adherence. Overall, it stands out as one of the most well-rounded open models available.
!ollama show qwen3 Model
architecture qwen3
parameters 8.2B
context length 40960
embedding length 4096
quantization Q4_K_M
Capabilities
completion
tools
thinking
Parameters
repeat_penalty 1
stop "<|im_start|>"
stop "<|im_end|>"
temperature 0.6
top_k 20
top_p 0.95
License
Apache License
Version 2.0, January 2004
...
Gemma3 (28.8M Downloads)¶
Gemma3 is a versatile, efficiency-oriented model family that is a good creative writer in my experience. It supports text and vision inputs and performs reliably across many languages. The model strikes a balance between capability and hardware efficiency, with smaller variants that run comfortably on limited resources. While Gemma3 is competitive on many benchmarks, it typically falls slightly behind the very top reasoning or coding-focused models at comparable scales.
Its main appeal lies in its multimodal flexibility and strong all-around performance. I would recommend using this for classroom exercises that involve computer vision and creative writing specifically.
!ollama show gemma3 Model
architecture gemma3
parameters 4.3B
context length 131072
embedding length 2560
quantization Q4_K_M
Capabilities
completion
vision
Parameters
stop "<end_of_turn>"
temperature 1
top_k 64
top_p 0.95
License
Gemma Terms of Use
Last modified: February 21, 2024
...
Llama 3.2 (50.8M Downloads)¶
Llama 3.2 focuses on efficiency and is the smallest of our generalist models at 2GB at 3b parameters, which deserves special mention. The model is well tuned for conversational tasks, summarization, and general text understanding, delivering solid performance relative to its size. However, it does not match larger models on deep reasoning or advanced coding tasks. Llama 3.2 is best seen as a very practical and deployment-friendly model for prototyping non-coding projects.
!ollama show llama3.2 Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization Q4_K_M
Capabilities
completion
tools
Parameters
stop "<|start_header_id|>"
stop "<|end_header_id|>"
stop "<|eot_id|>"
License
LLAMA 3.2 COMMUNITY LICENSE AGREEMENT
Llama 3.2 Version Release Date: September 25, 2024
...
Llama 3.1 (108M Downloads)¶
Llama 3.1 is a larger-scale generalist offering and is defined primarily by its breadth and scalability. Available in sizes ranging from modest 5GB to extremely large 243GB, it delivers strong performance across reasoning, coding, summarization, and dialogue, especially in its largest variants. The model supports long contexts and advanced features such as tool and function calling, making it suitable for complex agent-style applications.
Compared to Llama 3.2, it is less optimized for deployment but significantly stronger as a high-end, general-purpose language model.
!ollama show llama3.1 Model
architecture llama
parameters 8.0B
context length 131072
embedding length 4096
quantization Q4_K_M
Capabilities
completion
tools
Parameters
stop "<|start_header_id|>"
stop "<|end_header_id|>"
stop "<|eot_id|>"
License
LLAMA 3.1 COMMUNITY LICENSE AGREEMENT
Llama 3.1 Version Release Date: July 23, 2024
...
nomic-embed-text (48.5M Downloads)¶
nomic-text-embed is a text embedding model designed specifically for high-quality semantic representations rather than generative language tasks. Its primary strength lies in producing dense vector embeddings that capture meaning, similarity, and topical structure across documents, sentences, and short passages, making it well suited for classroom labs and projects dealing with retrieval-augmented generation (RAG), semantic search, clustering, and recommendation systems.
!ollama show nomic-embed-text Model
architecture nomic-bert
parameters 137M
context length 2048
embedding length 768
quantization F16
Capabilities
embedding
Parameters
num_ctx 8192
License
Apache License
Version 2.0, January 2004
...
All of these models will come in handy at various points throughout our exploration of Generative AI in this monograph!