Bijay's Blog

My personal space to ramble about things.

Understanding LLMs and Modern Inference Engines

Choosing an LLM inference engine is a hardware-and-systems decision, not a meme. For real self-hosting, runtime, throughput, concurrency, and cost matter as much as the model.

llms inference inference-engines vllm llama-cpp tensorrt-llm sglang self-hosting open-source-models gpu nvidia ai-infrastructure

May 19, 2026 Read more

Bijay's Blog

Understanding LLMs and Modern Inference Engines

Confirm