▲ SILICON
GPU architecture, memory hierarchies, hardware accelerators, and the physics of ML compute.
2606.003 Silicon
H100 vs H200 vs B200: TCO for Inference Infrastructure
Beyond the spec sheet: deriving actual cost per million tokens for each generation, accounting for memory capacity, bandwidth, rack power, and cooling — the numbers that determine your infrastructure decision.
⚙⚙⚙⚙⚙ 2026.06.22
2606.001 Silicon
GPU Memory Hierarchy and Kernel Performance
Why memory bandwidth — not FLOPs — is the binding constraint for most LLM workloads, and how H100's five-level hierarchy determines what your kernels can actually achieve.
⚙⚙⚙⚙⚙ 2026.06.18