The engineering layer of AI.

GPU architecture, inference systems, and ML algorithms — decoded from silicon to system to algorithm. Written for engineers who already know what RAG is.

2606.003 / ▲ Silicon / 2026.06.22

H100 vs H200 vs B200: TCO for Inference Infrastructure

Beyond the spec sheet: deriving actual cost per million tokens for each generation, accounting for memory capacity, bandwidth, rack power, and cooling — the numbers that determine your infrastructure decision.

20 min read
2606.001 / ▲ Silicon / 2026.06.18

GPU Memory Hierarchy and Kernel Performance

Why memory bandwidth — not FLOPs — is the binding constraint for most LLM workloads, and how H100's five-level hierarchy determines what your kernels can actually achieve.

22 min read