Bijay's Blog

My personal space to ramble about things.

All ai ai-infrastructure ai_engineering data_engineering de en english german gpu inference inference-engines ki llama-cpp llms medizin nvidia open-source-models rag self-hosting sglang tensorrt-llm vllm

Understanding LLMs and Modern Inference Engines

Choosing an LLM inference engine is a hardware-and-systems decision, not a meme. For real self-hosting, runtime, throughput, concurrency, and cost matter as much as the model.

Ein medizinisches Modell mit synthetischen Daten

Optimierte KI-Modelle für die medizinische Kodierung: Wie BERT und synthetische Daten den Klinikalltag revolutionieren.

State of Naïve RAG vs Agentic RAG in 2026

RAG is not dead. In 2026, agentic RAG often beats naïve RAG for accuracy and complex retrieval, but naïve RAG still wins for simple, fast, low-cost use cases.