Bijay's Blog

My personal space to ramble about things.

All ai ai-infrastructure ai_engineering data_engineering de en english german gpu inference inference-engines ki llama-cpp llms medizin nvidia open-source-models rag self-hosting sglang tensorrt-llm vllm

Understanding LLMs and Modern Inference Engines

Choosing an LLM inference engine is a hardware-and-systems decision, not a meme. For real self-hosting, runtime, throughput, concurrency, and cost matter as much as the model.