I build auditable AI infrastructure: GPU orchestration, evidence capture, provenance, and autonomous research control planes.

I work at the intersection of enterprise server engineering, accelerated AI infrastructure, and local autonomous systems. My focus is making AI work observable, bounded, and honest about what it produced — queue safety, worker orchestration, GPU-aware dispatch, evidence capture, provenance, claim ledgers, and failure modes that surface clearly instead of hiding behind model output.

Oklahoma City, OK

Work

Enoch

Agentic research control plane

Enoch treats autonomous AI failure modes as infrastructure problems: stale queues, hidden worker state, orphaned processes, GPU contention, scattered evidence, and reports that overstate results. It manages queue state, gates dispatch, supervises local AI runs, preserves evidence, and packages AI-generated research artifacts with provenance metadata and claim ledgers.

The goal is not to make autonomous AI look smarter. The goal is to make its work inspectable: what ran, when it ran, what evidence was captured, what claims were made, and where uncertainty remains.
Control Plane API
Queue state, dispatch decisions, project state, pause and maintenance controls
Wake Gate
Confirms a run is complete via process-tree tracking and CPU/GPU quiet-window telemetry
Worker Preflight
Authenticated health checks before dispatch — fails early rather than silently
Single-Lane Safety
Prevents overlapping GPU work on constrained hardware; control plane holds the lock
Evidence Sync
Copies run notes, metrics, evidence bundles, and claim ledgers before artifact generation
Artifact Writer
Generates publication-style reports from evidence, preserving uncertainty and provenance
Quality Gates
Scans for placeholder citations, missing provenance, and missing evidence before corpus entry

My role is the infrastructure: the control plane, dispatch gates, telemetry, artifact packaging, provenance model, and release process. The Enoch corpus — 120 AI-generated research artifacts produced over approximately two weeks, with evidence bundles, claim ledgers, reproducibility metadata, and quality reports — is published for inspection and critique. I do not claim personal authorship of the generated papers.

CouncilRouter

Multi-model AI deliberation proxy

CouncilRouter explores whether multi-model critique can reduce blind spots in complex reasoning, code review, and architecture decisions. It routes requests to 300+ externally-hosted models via OpenRouter with multi-round peer review, code-aware synthesis, and a Devil's Advocate module that challenges consensus with critical analysis.

It treats consensus as a signal, not proof.
Deliberation Engine
Multi-round peer review across models with configurable rounds and graceful degradation
Code-Aware Synthesis
Detects code, compares functional equivalence, validates syntax, security, and error handling
Devil's Advocate
Challenges consensus with critical analysis at configurable intensity
Production Layer
PostgreSQL + Redis, REST API, JWT/API key auth, rate limiting, idempotency, SSE streaming

Hardware & Kernel Work

Work on NVIDIA's Grace Blackwell consumer architecture (sm_121 / DGX Spark) — a platform with LPDDR5X memory, blockscaled MMA paths, and TMA async data movement that differs from H100/A100/H200/B200/B300. These projects were done in a private lab and are not publicly released.

FlashInfer SM121

Kernel debugging and patching for Blackwell

Debugged and patched FlashInfer for sm_121. When primary kernel paths produced Xid 13 and Xid 43 GPU faults, worked through the CuTe-DSL fallback path as a secondary route. Documented illegal instruction errors, misaligned addresses, and warp exceptions found in NVIDIA's kernel code running on sm_121. Built a systematic debugging campaign with variant matrices and regression runbooks for a platform whose architecture (LPDDR5X, blockscaled MMA, TMA) differs from H100/A100/H200/B200/B300.

CUTLASS Blockscaled TMA

Patches to NVIDIA's CUTLASS for sm_121 TMA operations

Modified NVIDIA's CUTLASS library for sm_121 blockscaled TMA operations — fixing how the Tensor Memory Accelerator loads data for the Blackwell blockscaled MMA path. Patches cover sm_100, sm_120, and sm_121 layout and builder headers.

Blackwell Inference Patches

vLLM, llama.cpp, and Mamba kernel work for GB10

Extended vLLM with SM120/SM121 compute capability mapping and MXFP4 backend detection. Added MXFP4/MoE tuning and BLACKWELL-OPT compilation flags to llama.cpp. Optimized Mamba SSM kernels: d_state reduction from 128→64 (27–34% faster on LPDDR5X), custom Triton kernels with explicit backward passes, and nsight profiling harnesses with A/B testing across BF16/TF32/torch.compile modes.

Project Squeegee

FP4 hotpatches and diffusion LM research

Seven FlashInfer SM121 FP4 hotpatches — workarounds for broken vendor code causing Xid 13 errors on GB10. Diffusion LM research pipeline with curriculum generation, education quality scoring, and reasoning parsers. Tracked and documented GPU crashes from vendor kernel code.

Architecture Research

Opus

Hybrid Mamba-2 SSM + differential attention

Hybrid architecture combining Mamba-2 SSM with differential shared attention (Attn₁ - λ·Attn₂) and LoRA depth adapters. 5:1 SSM-to-attention ratio. Benchmarked on GB10 with throughput measurements. FP8 training recipe adapted from DeepSeek-V3. Configurable model sizes from 125M to 7B.

Project Lattice

PaRT architecture and training infrastructure

PaRT (Patch-and-Refine Transformer): patch-based downsampling with cross-attention refinement, designed for memory-bandwidth-constrained hardware. Built LatticeDash, a real-time training dashboard with WebSocket streaming, convergence tracking, and curriculum learning. NVFP4 MLP precision experiments.

CoSpec

Spectral initialization from token co-occurrence

Research on whether variance-matched spectral initialization from token co-occurrence matrices improves early training dynamics for GPT-2-style models. Completed study with controlled experiments across five conditions (baseline, e_only, h_only, e_plus_h, spectrum_random). Result: the e_only condition — initializing only the embedding matrix from co-occurrence spectra — beat baseline in both screening and confirmation runs. Rust backend for co-occurrence accumulation performance.

TurboQuant

Quantization research reproduction

Local reproduction and evaluation of Google's TurboQuant paper. Implemented MSE codec (random rotation + Lloyd-Max codebook), product codec (MSE + QJL residual), split codec (mixed-precision channel split), and grouped passkey scoring with KV-cache proxy evaluation.

Engineering Focus

AI Infrastructure Control Planes
Dispatch safety, GPU worker state, telemetry, evidence sync, artifact provenance
Enterprise AI Operations
Large-scale datacenter deployment, Day 2 operations, triage, failure analysis, process development
NVIDIA Accelerated Systems
Grace Blackwell (sm_121) kernel debugging, CUTLASS TMA patches, FlashInfer CuTe-DSL fixes, MXFP4/MoE inference, TensorRT-LLM benchmarking, constrained local GPU operations
Local-First Infrastructure
Proxmox, OPNsense, ZFS, monitoring, alerting, backup validation, recovery automation, security hardening

Operational Background

Contact

Personal site and independent projects. Views and work are my own.