Agentic systems & autonomous workflows
Agents that decompose intent, call tools, recover from failure, and operate within bounded autonomy. Built on contemporary frameworks where they help and from first principles where they do not.
TravisML is an applied AI research, education, and agent-development company. We track the field as it moves, reproduce what's real, and turn it into research, courses, and agentic products.
We conduct our own original research and track the field as it ships — reproducing what matters, evaluating what's real, and publishing what we find. The signal that survives becomes everything else we do.
We turn that knowledge into professional-grade courses on building with modern AI — distilled from work we actually ship, not theory.
We build agents, tools, and frameworks as reusable products — technology and IP, not just billable hours.
We work in both directions: we conduct original research — including published, citable work — and we track the field as it ships, reproducing what matters and evaluating what's real. What survives gets engineered into products, taught in courses, or both.
Agents that decompose intent, call tools, recover from failure, and operate within bounded autonomy. Built on contemporary frameworks where they help and from first principles where they do not.
Agents whose capability set grows over time: skill registries, automatic skill induction from usage, and the guardrails that keep self-modification from becoming a liability.
Memory that is durable, queryable, and shareable across agents and environments: hierarchical stores, episodic and semantic layers, conflict resolution, and safe access patterns.
Retrieval pipelines that survive contact with real data: chunking strategy, embedding selection, hybrid search, reranking, and the evaluation harness that tells you when retrieval is silently failing.
Supervised fine-tuning, preference tuning, and adapter-based specialization, paired with evaluation infrastructure built to catch regression, drift, and quiet quality decay.
A live cross-section of what the bench is building, evaluating, and publishing right now. Updated continuously. Long-form coverage runs through the newsletter.
A forward-hook toolkit for causal interventions — zeroing or mean-patching attention heads, whole attention and MLP blocks, or entire layers — to measure what each part of an open-weight model actually contributes. Built for single-node runs on NVIDIA DGX Spark.
Specializing 4B-class open-weight models with LoRA, then shipping GGUF quantizations for fast, private inference through llama.cpp.
A single registry and canonical manifest for agentic systems, with adapters that normalize agents across frameworks and managed credential, MCP, and policy bindings. Private — in development.
An async-first framework for building LLM agents against any provider via LiteLLM — tool calling, pluggable memory, and an OpenAI-compatible server out of the box.
Standing up a realistic, observable agent target and running the major open-source AI-security tools against it — Garak, Promptfoo, llm-guard, and others — to map what each one actually catches.
Our own published research — “Token-level Generalization in LoRA Adapter Backdoors: Attack Characterization and Behavioral Detection” (arXiv:2605.30189) — characterizing trigger-based backdoors in LoRA adapters shared through public hubs and the behavioral methods that detect them without knowing the trigger.
The fastest way to understand a technique is to teach it.
So we do.
TravisML creates professional-grade courses on building with modern AI — agentic systems, retrieval and memory, fine-tuning and evaluation, and shipping models to production. Each one is distilled from work we actually do on the research bench, not from theory.
Courses are in development. They will be announced through the newsletter as they ship.
Our agent work is built as reusable technology — products, tools, and frameworks we own and improve over time — not one-off deliverables. The research bench feeds it; the engineering discipline keeps it shippable.
Agents that plan, call tools, and recover from failure within bounded autonomy — packaged as reusable runtimes rather than bespoke scripts.
Skill registries and induction pipelines that let an agent's capabilities grow safely over time, with guardrails on self-modification.
Durable, queryable memory shared across agents and environments — the layer that lets multi-agent systems stay coherent.
Harnesses and developer tooling that make agent behavior measurable and reproducible before it ever reaches production.
An intent-classification and risk-assessment layer between an agent's transport and policy evaluation — turning raw tool calls into authorization requests that policies can actually reason about.
A harness for developing, debugging, and evaluating custom agents, tools, MCP servers, prompts, and memory — across Anthropic, OpenAI, or a local model.
Travis is an AI engineer focused on agentic systems, applied machine learning, and the infrastructure that makes them run in production. He was a founding engineer at Deepwatch and is a Principal Engineer at GuidePoint.
He builds production AI — agents, memory, retrieval, and evaluation — with the engineering discipline to run it without surprises. Everything he builds is secure by design.
His writing on AI engineering appears on Substack and HuggingFace. Outside the work, he is an active amateur radio operator and emergency communications builder.
Alongside our own research and products, we take on a small number of consulting engagements. Most start with a short, paid R&D sprint so both sides can decide if it's a fit. If we're not the right call, we'll say so early.
A short, focused engagement to read, reproduce, and evaluate a specific technique against your problem. Output is a written brief and a working prototype.
A scoped build with a clear deliverable: discovery, design, implementation, handoff. Code, infrastructure, and evaluation harnesses are yours to keep.
Ongoing advisory for engineering leadership shaping AI strategy, evaluating vendors, or making architectural calls with long-term consequences.
Tell us what you're working on. Most replies go out within a business day. If we're not the right fit, we will say so early and where possible point you somewhere better.