problem
A lot of useful AI work touches sensitive files, and a lot of it needs the model to actually do things, not just chat.
approach
Keep private data on the machine when it matters, give the model real tools when it needs them, and measure whether the output is any good instead of guessing.
On-device RAG
The interesting constraint here was privacy: the documents being searched were exactly the ones that shouldn't leave the machine. So the whole pipeline (embeddings, retrieval, generation) runs locally on Ollama. No third-party API sees the contents, and the answers still cite the right source passages.
Agents that do things
Chat is the easy part. The work was wiring models to real tools (search and a few custom functions) and making the calls reliable: managing keys and request budgets, tuning temperature for the task, and forcing structured JSON so the output is something the next step can actually consume instead of parse-and-pray.
Measuring it
Models are easy to fool yourself about. I built an evaluation harness that scores outputs against curated reference sets and pulls a human in to label the close calls, so "it got better" is something I can show rather than feel.
Fine-tuning
I also fine-tuned a computer-vision model in PyTorch for face detection. Curating the train and test sets carefully was most of the work, and it's what moved accuracy version over version.