I work at the intersection of enterprise server engineering, accelerated AI infrastructure, and local autonomous systems. My focus is making AI work observable, bounded, and honest about what it produced — queue safety, worker orchestration, GPU-aware dispatch, evidence capture, provenance, claim ledgers, and failure modes that surface clearly instead of hiding behind model output.
Oklahoma City, OK
Enoch treats autonomous AI failure modes as infrastructure problems: stale queues, hidden worker state, orphaned processes, GPU contention, scattered evidence, and reports that overstate results. It manages queue state, gates dispatch, supervises local AI runs, preserves evidence, and packages AI-generated research artifacts with provenance metadata and claim ledgers.
My role is the infrastructure: the control plane, dispatch gates, telemetry, artifact packaging, provenance model, and release process. The Enoch corpus — 120 AI-generated research artifacts produced over approximately two weeks, with evidence bundles, claim ledgers, reproducibility metadata, and quality reports — is published for inspection and critique. I do not claim personal authorship of the generated papers.
CouncilRouter explores whether multi-model critique can reduce blind spots in complex reasoning, code review, and architecture decisions. It routes requests to 300+ externally-hosted models via OpenRouter with multi-round peer review, code-aware synthesis, and a Devil's Advocate module that challenges consensus with critical analysis.
Work on NVIDIA's Grace Blackwell consumer architecture (sm_121 / DGX Spark) — a platform with LPDDR5X memory, blockscaled MMA paths, and TMA async data movement that differs from H100/A100/H200/B200/B300. These projects were done in a private lab and are not publicly released.
Debugged and patched FlashInfer for sm_121. When primary kernel paths produced Xid 13 and Xid 43 GPU faults, worked through the CuTe-DSL fallback path as a secondary route. Documented illegal instruction errors, misaligned addresses, and warp exceptions found in NVIDIA's kernel code running on sm_121. Built a systematic debugging campaign with variant matrices and regression runbooks for a platform whose architecture (LPDDR5X, blockscaled MMA, TMA) differs from H100/A100/H200/B200/B300.
Modified NVIDIA's CUTLASS library for sm_121 blockscaled TMA operations — fixing how the Tensor Memory Accelerator loads data for the Blackwell blockscaled MMA path. Patches cover sm_100, sm_120, and sm_121 layout and builder headers.
Extended vLLM with SM120/SM121 compute capability mapping and MXFP4 backend detection. Added MXFP4/MoE tuning and BLACKWELL-OPT compilation flags to llama.cpp. Optimized Mamba SSM kernels: d_state reduction from 128→64 (27–34% faster on LPDDR5X), custom Triton kernels with explicit backward passes, and nsight profiling harnesses with A/B testing across BF16/TF32/torch.compile modes.
Seven FlashInfer SM121 FP4 hotpatches — workarounds for broken vendor code causing Xid 13 errors on GB10. Diffusion LM research pipeline with curriculum generation, education quality scoring, and reasoning parsers. Tracked and documented GPU crashes from vendor kernel code.
Hybrid architecture combining Mamba-2 SSM with differential shared attention (Attn₁ - λ·Attn₂) and LoRA depth adapters. 5:1 SSM-to-attention ratio. Benchmarked on GB10 with throughput measurements. FP8 training recipe adapted from DeepSeek-V3. Configurable model sizes from 125M to 7B.
PaRT (Patch-and-Refine Transformer): patch-based downsampling with cross-attention refinement, designed for memory-bandwidth-constrained hardware. Built LatticeDash, a real-time training dashboard with WebSocket streaming, convergence tracking, and curriculum learning. NVFP4 MLP precision experiments.
Research on whether variance-matched spectral initialization from token co-occurrence matrices improves early training dynamics for GPT-2-style models. Completed study with controlled experiments across five conditions (baseline, e_only, h_only, e_plus_h, spectrum_random). Result: the e_only condition — initializing only the embedding matrix from co-occurrence spectra — beat baseline in both screening and confirmation runs. Rust backend for co-occurrence accumulation performance.
Local reproduction and evaluation of Google's TurboQuant paper. Implemented MSE codec (random rotation + Lloyd-Max codebook), product codec (MSE + QJL residual), split codec (mixed-precision channel split), and grouped passkey scoring with KV-cache proxy evaluation.
Personal site and independent projects. Views and work are my own.