Published onAugust 28, 2025RAG is (Not) Dead: How to Think about Building RAG Systemsllmsaipromptingagentscontext-engineeringprompt-engineeringragRAG isn't about vector databases and embeddings, or any specific architecture. It's about retrieving relevant context well.
Published onJuly 10, 2025You're Doing it Wrong: Prompt- and Context-Engineer with XML, not JSONllmsaipromptingagentscontext-engineeringprompt-engineeringExploring the syntactic and semantic differences between XML and JSON and why the former provides a more robust structure for complex LLM prompts
Published onOctober 30, 2024Implementing OpenAI-Compatible Tool Calling & Tool Streaming for Open-source models in vLLMAIvLLMopen-sourceagentsLLMsinferenceThis is a transcription of a talk I gave at vLLM's office hours after landing vLLM's first-of-its-kind tool calling implementation that allows using OpenAI-compatible tools and tool streaming with opens-source models.