Frontmatter

This book offers a hands-on introduction to software engineering (and other related topics!) with Generative AI, combining formal foundations with easily executable code, practical design patterns and best practices.

Large Language Models (LLMs) and Generative AI have fundamentally changed what it means to build software. We’ve moved from telling computers exactly what to do, step by step, to showing them examples and watching them learn patterns we never explicitly programmed. It’s a bit like the difference between following a recipe versus learning to cook by watching a pro chef who (hopefully!) knows what they’re doing.

What’s also very exciting is that you don’t need a “supercomputer” anymore to get all the best benefits from AI. The combination of: the general increase in efficiency of AI and ML technology, creation of open-source tools like Ollama, along with capable efficient (small) models - has democratized Generative AI in a way that would have seemed impossible just a few years ago. Now, GenAI isn’t a luxury reserved for big tech companies with massive GPU clusters — it’s a programmable resource which can be made available to every student, professor and curious tinkerer who wants to experiment.

This book is a hands-on, example-driven journey through practical AI-augmented software engineering. Together, we’ll explore the core concepts you actually need for success in tech: prompting or “prompt engineering” (the art of communicating with models effectively), retrieval-augmented generation (teaching models to reference external knowledge), agent architectures (systems that can plan and execute multi-step tasks), and code generation & review that actually works (based on empirical research). But here’s the key value add — every concept comes with carefully crafted executable Python examples that you can run, modify, directly learn from, and use to kickstart your own projects! No hand-waving, and nothing “left as an exercise for the reader” (okay maybe a few!). I also include a few of my best attempts at humor, because we all know software engineering is much better with a joke or two! I’ll do my best not to rely too much on zoomer slang to be funny.

The book discusses the mechanics of how code-specific models work (and differ or excel), why RAG systems help ground AI in reality rather than hallucinations, and how to design agents that can tackle complex programming tasks without getting lost. You’ll understand not just what these systems do, but how they do it and why certain approaches work better than others when going about these tasks.

This material is written especially for teachers who want to bring AI into their curriculums, students building mental models of AI tools, and software engineers who want to integrate GenAI into their workflow. The goal is to make AI concrete and comprehensible, so you can use it productively and know when to trust it (and when not to).

Notes on Book Software¶

To aid reproducibility and understanding, I show and discuss the (Python) software package versions used to build this executable book!

Python and Ollama¶

from platform import python_version
print(python_version())

3.13.7

This book makes significant use of Ollama and its Python API. Ollama is a free and open source tool that simplifies managing and running LLMs locally. Ollama executes models entirely on the user’s machine, providing control over model selection, configuration, and execution. This makes Ollama especially valuable in educational settings, where transparency, reproducibility, and hands-on experimentation are essential.

When using Ollama, students can work with modern open-weight models — such as general-purpose language models, code-focused models, and embedding models, without relying on external APIs or internet connectivity. This enables cost-free experimentation while preserving privacy and allowing direct observation of system behavior, including token usage, context limits, and output variability.

In the college-level courses where this book is used, Ollama facilitates a laboratory environment for studying Generative AI as an transparent system rather than a black box. Working through guided labs, students explore core concepts such as prompt structure, sampling parameters, model tradeoffs, and retrieval-augmented generation. When they run models locally and embed them into their own applications, students begin to develop practical intuition about how large language models work and how they fail, their limitations and how they can be responsibly and effectively used in real production software systems.

!uv pip show ollama

Name: ollama
Version: 0.6.0
Location: /Users/chikeabuah/Desktop/programming-genai/.venv/lib/python3.13/site-packages
Requires: httpx, pydantic
Required-by:

Jupyter¶

The structure and executable nature of this book is enabled by the Jupyter ecosystem.

!jupyter --version

Selected Jupyter core packages...
IPython          : 9.6.0
ipykernel        : 7.1.0
ipywidgets       : 8.1.7
jupyter_client   : 8.6.3
jupyter_core     : 5.9.1
jupyter_server   : 2.17.0
jupyterlab       : 4.4.10
nbclient         : 0.10.2
nbconvert        : 7.16.6
nbformat         : 5.10.4
notebook         : 7.4.7
qtconsole        : not installed
traitlets        : 5.14.3

This book is typset using Jupyter Book 2.0 which was in alpha release when I started writing this book.

Jupyter Book is a great resource for sharing knowledge, and I am a big fan!

!jupyter book --version

v2.0.0-b3

!jupyter kernel --version

8.6.3

I can personally recommend the Microsoft VS Code editor for interacting with Jupyter Notebooks. It is free to use, and was used in the writing of this book (and its predecessor).

!code --version

1.105.1
7d842fb85a0275a4a8e4d7e040d2625abbf7f084
arm64

Our “Co-Instructors”¶

Not to get too anthropomorphic right off the bat, but to create an ideal educational experience we will need the perfect co-instructors, or maybe teaching assistants if you will.

Throughout this book we will make use of some of the most popular open-source models in the world at the time of initial writing. No one model suits all purposes, and we will utilize about 6 or 7 different models that are uniquely specialized in some particular purpose like coding, vision, creative writing, vector embeddings, etc.

!ollama list

NAME                       ID              SIZE      MODIFIED    
nomic-embed-text:latest    0a109f422b47    274 MB    4 weeks ago    
llama3.1:latest            46e0c10c039e    4.9 GB    6 weeks ago    
llama3.2:latest            a80c4f17acd5    2.0 GB    6 weeks ago    
qwen3:latest               500a1f067a9f    5.2 GB    6 weeks ago    
qwen2.5-coder:latest       dae161e27b0e    4.7 GB    6 weeks ago    
deepseek-r1:latest         6995872bfe4c    5.2 GB    6 weeks ago

I used the same model sizes writing this book on my Macbook M4 32GB (when i say GB in this book I mean gigabyte by default) machine as my students working on the course labs on a Tesla P4 8GB gpu on the WWU CS LAN.

I will briefly discuss each model in the following sections. I also show the download count at the time of writing this monograph, to demonstrate each model’s popularity in the open-source community.

`DeepSeek-R1` (75M Downloads)¶

DeepSeek-R1 is best understood as a reasoning-first model that prioritizes structured, multi-step thinking over speed or stylistic fluency. It performs especially well on mathematics, logic, and problems that benefit from explicit intermediate reasoning, often matching or exceeding larger generalist models on reasoning benchmarks.

This comes at the cost of verbosity and latency: the model tends to produce longer responses and can feel slow or inefficient for simple queries. Its strengths make it well suited for research, theorem-style problem solving, and analytical tasks, but it is less compelling for casual dialogue, creative writing, or lightweight instruction.

In my experience this model probably has less guardrails than the others mentioned here, and will speak it’s “mind” on most topics. This makes it a great candidate for complex strategy, logical analysis, and ethics experiments!

!ollama show deepseek-r1

  Model
    architecture        qwen3     
    parameters          8.2B      
    context length      131072    
    embedding length    4096      
    quantization        Q4_K_M    

  Capabilities
    completion    
    thinking      

  Parameters
    stop           "<｜begin▁of▁sentence｜>"    
    stop           "<｜end▁of▁sentence｜>"      
    stop           "<｜User｜>"                 
    stop           "<｜Assistant｜>"            
    temperature    0.6                          
    top_p          0.95                         

  License
    MIT License                    
    Copyright (c) 2023 DeepSeek    
    ...

`Qwen2.5-Coder` (9.4M Downloads)¶

Qwen2.5-Coder is a highly specialized model optimized almost entirely for programming tasks. It excels at code generation, refactoring, and debugging (code repair), often producing correct solutions on the first attempt and ranking near the top of open-source coding benchmarks. Its outputs tend to be structured, syntactically clean, and aligned with common software engineering patterns.

However, this specialization also limits its versatility: outside of programming contexts, its reasoning, conversational quality, and creative abilities are noticeably weaker than strong generalist models. Qwen2.5-Coder is most effective when used as a dedicated coding assistant rather than a general AI system. This model is a good candidate to have as the default for a software engineering class. It is much more efficient in size at 5GB for the 7b (7 billion parameter version) than Qwen3-Coder which requires at least 19GB.

!ollama show qwen2.5-coder

  Model
    architecture        qwen2     
    parameters          7.6B      
    context length      32768     
    embedding length    3584      
    quantization        Q4_K_M    

  Capabilities
    completion    
    tools         
    insert        

  System
    You are Qwen, created by Alibaba Cloud. You are a helpful assistant.    

  License
    Apache License               
    Version 2.0, January 2004    
    ...

`Qwen3` (15.8M Downloads)¶

Qwen3 is a strong general-purpose model designed to balance reasoning ability, coding skill and tool-calling. It performs well across a wide range of benchmarks and supports different operating modes that trade off speed for deeper reasoning, making it flexible for both interactive use and harder analytical tasks. While generally competitive with the best open models, Qwen3 can occasionally struggle with strict instruction adherence. Overall, it stands out as one of the most well-rounded open models available.

!ollama show qwen3

  Model
    architecture        qwen3     
    parameters          8.2B      
    context length      40960     
    embedding length    4096      
    quantization        Q4_K_M    

  Capabilities
    completion    
    tools         
    thinking      

  Parameters
    repeat_penalty    1                 
    stop              "<|im_start|>"    
    stop              "<|im_end|>"      
    temperature       0.6               
    top_k             20                
    top_p             0.95              

  License
    Apache License               
    Version 2.0, January 2004    
    ...

`Gemma3` (28.8M Downloads)¶

Gemma3 is a versatile, efficiency-oriented model family that is a good creative writer in my experience. It supports text and vision inputs and performs reliably across many languages. The model strikes a balance between capability and hardware efficiency, with smaller variants that run comfortably on limited resources. While Gemma3 is competitive on many benchmarks, it typically falls slightly behind the very top reasoning or coding-focused models at comparable scales.

Its main appeal lies in its multimodal flexibility and strong all-around performance. I would recommend using this for classroom exercises that involve computer vision and creative writing specifically.

!ollama show gemma3

  Model
    architecture        gemma3    
    parameters          4.3B      
    context length      131072    
    embedding length    2560      
    quantization        Q4_K_M    

  Capabilities
    completion    
    vision        

  Parameters
    stop           "<end_of_turn>"    
    temperature    1                  
    top_k          64                 
    top_p          0.95               

  License
    Gemma Terms of Use                  
    Last modified: February 21, 2024    
    ...

`Llama 3.2` (50.8M Downloads)¶

Llama 3.2 focuses on efficiency and is the smallest of our generalist models at 2GB at 3b parameters, which deserves special mention. The model is well tuned for conversational tasks, summarization, and general text understanding, delivering solid performance relative to its size. However, it does not match larger models on deep reasoning or advanced coding tasks. Llama 3.2 is best seen as a very practical and deployment-friendly model for prototyping non-coding projects.

!ollama show llama3.2

  Model
    architecture        llama     
    parameters          3.2B      
    context length      131072    
    embedding length    3072      
    quantization        Q4_K_M    

  Capabilities
    completion    
    tools         

  Parameters
    stop    "<|start_header_id|>"    
    stop    "<|end_header_id|>"      
    stop    "<|eot_id|>"             

  License
    LLAMA 3.2 COMMUNITY LICENSE AGREEMENT                 
    Llama 3.2 Version Release Date: September 25, 2024    
    ...

`Llama 3.1` (108M Downloads)¶

Llama 3.1 is a larger-scale generalist offering and is defined primarily by its breadth and scalability. Available in sizes ranging from modest 5GB to extremely large 243GB, it delivers strong performance across reasoning, coding, summarization, and dialogue, especially in its largest variants. The model supports long contexts and advanced features such as tool and function calling, making it suitable for complex agent-style applications.

Compared to Llama 3.2, it is less optimized for deployment but significantly stronger as a high-end, general-purpose language model.

!ollama show llama3.1

  Model
    architecture        llama     
    parameters          8.0B      
    context length      131072    
    embedding length    4096      
    quantization        Q4_K_M    

  Capabilities
    completion    
    tools         

  Parameters
    stop    "<|start_header_id|>"    
    stop    "<|end_header_id|>"      
    stop    "<|eot_id|>"             

  License
    LLAMA 3.1 COMMUNITY LICENSE AGREEMENT            
    Llama 3.1 Version Release Date: July 23, 2024    
    ...

`nomic-embed-text` (48.5M Downloads)¶

nomic-text-embed is a text embedding model designed specifically for high-quality semantic representations rather than generative language tasks. Its primary strength lies in producing dense vector embeddings that capture meaning, similarity, and topical structure across documents, sentences, and short passages, making it well suited for classroom labs and projects dealing with retrieval-augmented generation (RAG), semantic search, clustering, and recommendation systems.

!ollama show nomic-embed-text

  Model
    architecture        nomic-bert    
    parameters          137M          
    context length      2048          
    embedding length    768           
    quantization        F16           

  Capabilities
    embedding    

  Parameters
    num_ctx    8192    

  License
    Apache License               
    Version 2.0, January 2004    
    ...

All of these models will come in handy at various points throughout our exploration of Generative AI in this monograph!

Notes on Book Software¶

Python and Ollama¶

Jupyter¶

Our “Co-Instructors”¶

DeepSeek-R1 (75M Downloads)¶

Qwen2.5-Coder (9.4M Downloads)¶

Qwen3 (15.8M Downloads)¶

Gemma3 (28.8M Downloads)¶

Llama 3.2 (50.8M Downloads)¶

Llama 3.1 (108M Downloads)¶

nomic-embed-text (48.5M Downloads)¶