Llms

Published on
August 28, 2025
RAG is (Not) Dead: How to Think about Building RAG Systems
llms ai prompting agents context-engineering prompt-engineering rag
RAG isn't about vector databases and embeddings, or any specific architecture. It's about retrieving relevant context well.
Published on
July 10, 2025
You're Doing it Wrong: Prompt- and Context-Engineer with XML, not JSON
llms ai prompting agents context-engineering prompt-engineering
Exploring the syntactic and semantic differences between XML and JSON and why the former provides a more robust structure for complex LLM prompts
Published on
October 30, 2024
Implementing OpenAI-Compatible Tool Calling & Tool Streaming for Open-source models in vLLM
AI vLLM open-source agents LLMs inference
This is a transcription of a talk I gave at vLLM's office hours after landing vLLM's first-of-its-kind tool calling implementation that allows using OpenAI-compatible tools and tool streaming with opens-source models.

RAG is (Not) Dead: How to Think about Building RAG Systems