Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

This book offers a hands-on introduction to software engineering (and other related topics!) with generative AI, combining formal foundations with easily executable code, practical design patterns and best practices.

Large Language Models (LLMs) and generative AI have fundamentally changed what it means to build software. We’ve moved from telling computers exactly what to do, step by step, to showing them examples and watching them learn patterns we never explicitly programmed. It’s a bit like the difference between following a recipe and learning to cook by watching someone who hopefully knows what they’re doing.

What’s also very exciting is that don’t need a “supercomputer” anymore to get all the best benefits from this. The combination of open-source tools like Ollama, surprisingly capable small models, and the everyday laptop sitting on your desk has democratized generative AI in a way that would have seemed impossible just a few years ago. GenAI isn’t a luxury reserved for big tech companies with massive GPU clusters—it’s a programmable resource available to every developer, student, professor and curious tinkerer who wants to experiment.

This book is a hands-on, example-driven journey through practical AI-augmented software engineering. We’ll explore the core concepts you actually need: prompting or “prompt engineering” (the art of talking to models effectively), retrieval-augmented generation (teaching models to reference external knowledge), agent architectures (systems that can plan and execute multi-step tasks), and code generation/review that actually works. But here’s the key—every concept comes with executable Python examples that you can run, modify, and learn from. No hand-waving, and nothing “left as an exercise for the reader.”

The book discusses the mechanics of how code-specific models work, why RAG systems help ground AI in reality rather than hallucinations, and how to design agents that can tackle complex programming tasks without getting lost. You’ll understand not just what these systems do, but how they do it and why certain approaches work better than others.

This material is written for teachers who want to bring AI into their curriculum, students building mental models of these strange new tools, and software engineers who want to integrate GenAI into their workflow. The goal is to make AI concrete and comprehensible, so you can use it productively and know when to trust it (and when not to).

Notes on Book Software

To aid reproducibility, I show the Python version — and versions of the major Python packages — used to build this executable book.

from platform import python_version
print(python_version())
3.13.7

This book makes significant use of Ollama and its Python API. Ollama is an open source tool that simplifies managing and running LLMs locally, like on a PC or local server, for example. Ollama executes models entirely on the user’s machine, providing full control over model selection, configuration, and execution. This local-first design makes Ollama especially valuable in educational settings, where transparency, reproducibility, and hands-on experimentation are essential.

When using Ollama, students can work with modern open-weight models—such as general-purpose language models, code-focused models, and embedding models—without relying on external APIs or internet connectivity. This enables cost-free experimentation while preserving privacy and allowing direct observation of system behavior, including token usage, context limits, and output variability. Ollama’s simple command-line interface and API support interactive exploration and programmatic integration into software systems.

In the college-level courses where this book is used, Ollama serves as a laboratory environment for studying generative AI as an engineered system rather than a black box. Working through guided labs, students explore core concepts such as prompt structure, sampling parameters, model tradeoffs, and retrieval-augmented generation. When they run models locally and embed them into their own applications, students begin to develop practical intuition about how large language models work and how they fail, their limitations and how they can be responsibly and effectively used in real production software systems.

!uv pip show ollama
Name: ollama
Version: 0.6.0
Location: /Users/chikeabuah/Desktop/programming-genai/.venv/lib/python3.13/site-packages
Requires: httpx, pydantic
Required-by:

The structure and executable nature of this book is enabled by the Jupyter ecosystem.

!jupyter --version
Selected Jupyter core packages...
IPython          : 9.6.0
ipykernel        : 7.1.0
ipywidgets       : 8.1.7
jupyter_client   : 8.6.3
jupyter_core     : 5.9.1
jupyter_server   : 2.17.0
jupyterlab       : 4.4.10
nbclient         : 0.10.2
nbconvert        : 7.16.6
nbformat         : 5.10.4
notebook         : 7.4.7
qtconsole        : not installed
traitlets        : 5.14.3

This book is typset using Jupyter Book 2.0 which was in alpha release when I started writing this book.

Jupyter Book is a great resource for sharing knowledge, and I am a big fan.

!jupyter book --version
v2.0.0-b3
!jupyter kernel --version
8.6.3

I can personally recommend the Microsoft Visual Studio Code editor for interacting with Jupyter Notebooks. It is free to use, and was used in the writing of this book (and its predecessor).

!code --version 
1.105.1
7d842fb85a0275a4a8e4d7e040d2625abbf7f084
arm64

The Co-Instructors: Our Models

Not to get too anthropomorphic right off the bat, but to create an ideal educational experience we will need the perfect teaching assistants.

Throughout this book we will make use of some of the most popular open-source GenAI models in the world at the time of initial writing (early 2026). No one model suits all purposes, and we will utilize about 6 or 7 different models that are uniquely specialized in some particular purpose like coding, vision, creative writing, creating embeddings, etc.

!ollama list
NAME                       ID              SIZE      MODIFIED    
nomic-embed-text:latest    0a109f422b47    274 MB    4 weeks ago    
llama3.1:latest            46e0c10c039e    4.9 GB    6 weeks ago    
llama3.2:latest            a80c4f17acd5    2.0 GB    6 weeks ago    
qwen3:latest               500a1f067a9f    5.2 GB    6 weeks ago    
qwen2.5-coder:latest       dae161e27b0e    4.7 GB    6 weeks ago    
deepseek-r1:latest         6995872bfe4c    5.2 GB    6 weeks ago    

You can see above that I tend to utilize the default or latest category of model sizes, which is usually around the ~7b parameter size. I used the same model sizes writing this book on my Macbook M4 32GB (when i say GB in this book I mean gigabyte by default) machine as my students used doing the labs on a shared Tesla P4 8GB gpu on the WWU CS LAN.

I will briefly discuss each model in the following sections. I also show the download count at the time to emphasize each model’s popularity in the open-source community.

DeepSeek-R1 (75M Downloads)

DeepSeek-R1 is best understood as a reasoning-first model that prioritizes structured, multi-step thinking over speed or stylistic fluency. It performs especially well on mathematics, logic, and problems that benefit from explicit intermediate reasoning, often matching or exceeding larger generalist models on reasoning benchmarks.

This comes at the cost of verbosity and latency: the model tends to produce longer responses and can feel slow or inefficient for simple queries. Its strengths make it well suited for research, theorem-style problem solving, and analytical tasks, but it is less compelling for casual dialogue, creative writing, or lightweight instruction following where faster, more concise models feel smoother.

In my experience this model probably has less guardrails than the others mentioned here, and will speak it’s “mind” on most topics. Definitely a great candidate for complex strategy, analysis, and ethics experiments.

!ollama show deepseek-r1
  Model
    architecture        qwen3     
    parameters          8.2B      
    context length      131072    
    embedding length    4096      
    quantization        Q4_K_M    

  Capabilities
    completion    
    thinking      

  Parameters
    stop           "<|begin▁of▁sentence|>"    
    stop           "<|end▁of▁sentence|>"      
    stop           "<|User|>"                 
    stop           "<|Assistant|>"            
    temperature    0.6                          
    top_p          0.95                         

  License
    MIT License                    
    Copyright (c) 2023 DeepSeek    
    ...                            

Qwen2.5-Coder (9.4M Downloads)

Qwen2.5-Coder is a highly specialized model optimized almost entirely for programming tasks. It excels at code generation, refactoring, and debugging (code repair), often producing correct solutions on the first attempt and ranking near the top of open-source coding benchmarks. Its outputs tend to be structured, syntactically clean, and aligned with common software engineering patterns.

However, this specialization also limits its versatility: outside of programming contexts, its reasoning, conversational quality, and creative abilities are noticeably weaker than strong generalist models. Qwen2.5-Coder is most effective when used as a dedicated coding assistant rather than a general AI system. This model is a good candidate to have as the default for a software engineering class. It is much more efficient in size at 5GB for the 7b (7 billion parameter version) than Qwen3-Coder which requires at least 19GB.

!ollama show qwen2.5-coder
  Model
    architecture        qwen2     
    parameters          7.6B      
    context length      32768     
    embedding length    3584      
    quantization        Q4_K_M    

  Capabilities
    completion    
    tools         
    insert        

  System
    You are Qwen, created by Alibaba Cloud. You are a helpful assistant.    

  License
    Apache License               
    Version 2.0, January 2004    
    ...                          

Qwen3 (15.8M Downloads)

Qwen3 is a strong general-purpose open model designed to balance reasoning ability, coding skill, multilingual coverage, tool-calling, and efficiency. It performs well across a wide range of benchmarks and supports different operating modes that trade off speed for deeper reasoning, making it flexible for both interactive use and harder analytical tasks. Its multilingual capabilities are particularly strong, and larger variants benefit from mixture-of-experts designs that improve efficiency without sacrificing quality. While generally competitive with the best open models, Qwen3 can occasionally struggle with strict instruction adherence or simple factual queries compared to more conservatively tuned models. Overall, it stands out as one of the most well-rounded open models available.

!ollama show qwen3 
  Model
    architecture        qwen3     
    parameters          8.2B      
    context length      40960     
    embedding length    4096      
    quantization        Q4_K_M    

  Capabilities
    completion    
    tools         
    thinking      

  Parameters
    repeat_penalty    1                 
    stop              "<|im_start|>"    
    stop              "<|im_end|>"      
    temperature       0.6               
    top_k             20                
    top_p             0.95              

  License
    Apache License               
    Version 2.0, January 2004    
    ...                          

Gemma3 (28.8M Downloads)

Gemma3 is a versatile, efficiency-oriented model family that is a good and creative writer in my experience. It supports text and vision inputs and performs reliably across many languages. The model strikes a balance between capability and hardware efficiency, with smaller variants that run comfortably on limited resources. While Gemma3 is competitive on many benchmarks, it typically falls slightly behind the very top reasoning or coding-focused models at comparable scales.

Its main appeal lies in its multimodal flexibility and strong all-around performance. I would recommend using this for classroom exercises that involve computer vision.

!ollama show gemma3
  Model
    architecture        gemma3    
    parameters          4.3B      
    context length      131072    
    embedding length    2560      
    quantization        Q4_K_M    

  Capabilities
    completion    
    vision        

  Parameters
    stop           "<end_of_turn>"    
    temperature    1                  
    top_k          64                 
    top_p          0.95               

  License
    Gemma Terms of Use                  
    Last modified: February 21, 2024    
    ...                                 

Llama 3.2 (50.8M Downloads)

Llama 3.2 focuses on efficiency and is the smallest of our generalist models at 2GB at 3b parameters, which deserves special mention. The model is well tuned for conversational tasks, summarization, and general text understanding, delivering solid performance relative to its size. However, it does not match larger models on deep reasoning or advanced coding tasks. Llama 3.2 is best seen as a very practical and deployment-friendly model for prototyping non-coding projects.

!ollama show llama3.2
  Model
    architecture        llama     
    parameters          3.2B      
    context length      131072    
    embedding length    3072      
    quantization        Q4_K_M    

  Capabilities
    completion    
    tools         

  Parameters
    stop    "<|start_header_id|>"    
    stop    "<|end_header_id|>"      
    stop    "<|eot_id|>"             

  License
    LLAMA 3.2 COMMUNITY LICENSE AGREEMENT                 
    Llama 3.2 Version Release Date: September 25, 2024    
    ...                                                   

Llama 3.1 (108M Downloads)

Llama 3.1 represents Meta’s larger-scale generalist offering and is defined primarily by its breadth and scalability. Available in sizes ranging from modest 5GB to extremely large 243GB, it delivers strong performance across reasoning, coding, summarization, and dialogue, especially in its largest variants. The model supports long contexts and advanced features such as tool and function calling, making it suitable for complex agent-style applications. Its main drawback is resource intensity: the most capable versions require substantial compute and memory, limiting accessibility.

Compared to Llama 3.2, it is less optimized for deployment but significantly stronger as a high-end, general-purpose language model.

!ollama show llama3.1
  Model
    architecture        llama     
    parameters          8.0B      
    context length      131072    
    embedding length    4096      
    quantization        Q4_K_M    

  Capabilities
    completion    
    tools         

  Parameters
    stop    "<|start_header_id|>"    
    stop    "<|end_header_id|>"      
    stop    "<|eot_id|>"             

  License
    LLAMA 3.1 COMMUNITY LICENSE AGREEMENT            
    Llama 3.1 Version Release Date: July 23, 2024    
    ...                                              

nomic-embed-text (48.5M Downloads)

nomic-text-embed is a text embedding model designed specifically for high-quality semantic representations rather than generative language tasks. Its primary strength lies in producing dense vector embeddings that capture meaning, similarity, and topical structure across documents, sentences, and short passages, making it well suited for classroom labs and projects dealing with retrieval-augmented generation (RAG), semantic search, clustering, and recommendation systems.

The model emphasizes strong performance on retrieval and similarity benchmarks while remaining efficient enough for large-scale indexing. Because it is not a generative model, it does not produce natural language responses and is instead intended to be used as an infrastructure component that feeds downstream systems such as LLMs or search pipelines.

Its appeal is in reliability, consistency, and alignment with modern vector-database workflows rather than conversational or reasoning capabilities.

!ollama show nomic-embed-text
  Model
    architecture        nomic-bert    
    parameters          137M          
    context length      2048          
    embedding length    768           
    quantization        F16           

  Capabilities
    embedding    

  Parameters
    num_ctx    8192    

  License
    Apache License               
    Version 2.0, January 2004    
    ...