A publication for engineers shipping inference

The engineering
layer of AI.

Deep technical writing on LLM, GPU, and ML systems internals. Decoded from silicon to system to algorithm — for the engineers who already know what RAG is.

Read 2606.003 → Editorial thesis →

/01Three layers, one publication

Pick your depth

/ SILICON

The hardware that runs the model.

HBM hierarchy, NVLink topology, tensor core generations, CUDA streams, kernel-level reality.

NVLink· HBM3e· TMA· FP8· SM90

02 Articles

/ SYSTEM

The infra that serves the model.

KV cache, paged attention, continuous batching, parallelism strategies, NCCL collectives.

vLLM· FSDP· NCCL· Triton· SGLang

01 Articles

/ ALGORITHM

The math that is the model.

Attention variants, MoE routing, quantization, alignment, long context, positional encoding.

FlashAttn· RoPE· MoE· RLHF· GQA

/02Latest

H100 vs H200 vs B200: TCO for Inference Infrastructure

Beyond the spec sheet: deriving actual cost per million tokens for each generation, accounting for memory capacity, bandwidth, rack power, and cooling — the numbers that determine your infrastructure decision.

⚙⚙⚙⚙⚙ 2026.06.22

2606.002 System

System · 26 min

Intra-node vs Inter-node Interconnects in Distributed Training

NVLink, NVSwitch, InfiniBand, and RoCE — the bandwidth and latency numbers that determine whether your distributed training job scales or stalls.

⚙⚙⚙⚙⚙ 2026.06.20

2606.001 Silicon

Silicon · 22 min

GPU Memory Hierarchy and Kernel Performance

Why memory bandwidth — not FLOPs — is the binding constraint for most LLM workloads, and how H100's five-level hierarchy determines what your kernels can actually achieve.

⚙⚙⚙⚙⚙ 2026.06.18

/03 Editorial

We decode AI one layer at a time. Silicon tells you what the hardware can do. System tells you how inference actually runs. Algorithm tells you why the math works. All three, in depth, without the vendor gloss.

Written for the engineers building inference infrastructure — not the engineers explaining what inference is. Dense. Verifiable. No filler.

fp4 editorial desk

The engineering layer of AI.