Agentic systems & autonomous workflows
Production agents that decompose intent, call tools, recover from failure, and operate within bounded autonomy. Built on contemporary frameworks where they help and from first principles where they do not.
TravisML is an applied AI research and engineering practice. We track the field as it moves, reproduce what's new, evaluate what's real, and turn the parts that hold up into production systems for organizations across the public and private sector.
The field moves weekly. Most teams cannot read it, reproduce it, and decide what to actually use fast enough to keep up.
That's the practice.
TravisML is structured as an applied research practice. We follow the literature as it ships, reproduce the techniques that matter, and run them against the kinds of problems our clients actually face. The signal that survives that process is what we engineer into production.
The work is industry-agnostic. What we bring is the research literacy to know what's promising versus what's noise, the engineering discipline to build it correctly, and the security instinct to know what will eventually break.
If your team needs to move from "a paper came out about this" to "we ship this in production," we are useful. If you need a vendor to wrap an off-the-shelf API, we are probably not the right call.
Designing systems that plan, act, and adapt. The core of what we build.
Production agents that decompose intent, call tools, recover from failure, and operate within bounded autonomy. Built on contemporary frameworks where they help and from first principles where they do not.
Agents whose capability set grows over time. Skill registries, automatic skill induction from usage patterns, and the operational guardrails that keep self-modification from becoming a liability.
Memory that is durable, queryable, and shareable across agents and environments. Hierarchical stores, episodic and semantic layers, conflict resolution, and the access control patterns that make shared memory safe to deploy.
The model layer. Retrieval, fine-tuning, and the work of making a system actually answer correctly.
Retrieval pipelines that survive contact with real data. Chunking strategy, embedding selection, hybrid search, reranking, and the evaluation harness that tells you when retrieval is silently failing.
Supervised fine-tuning, preference tuning, and adapter-based specialization. Paired with evaluation infrastructure built to catch regression, drift, and the quiet quality decay that production models accumulate.
A decade of detection engineering, woven into every system we ship. Not a service line; an instinct.
Threat-modeled architectures, prompt and tool-call boundary controls, output validation, and the operational telemetry that makes an AI system actually auditable. The work that lets a system run unattended without becoming a problem.
For security teams: applied deep learning over telemetry pipelines, behavioral baselining, protocol classification, and the SIEM-scale infrastructure to operationalize models without burying analysts in noise.
A live cross-section of the research, models, and techniques the practice is actively reading, reproducing, or evaluating against client problems. Updated continuously. Published in long-form through the Signal & Noise newsletter.
Investigating recent work on agents that derive new reusable skills from successful task executions, including the operational guardrails needed to deploy capability evolution safely.
Benchmarking emerging patterns for persistent, queryable memory shared across multiple agents and runtime environments, with particular attention to consistency and access control under concurrent writes.
Following recent results on reasoning models trained against learned and programmatic verifiers, and the implications for production systems where verifiability matters more than raw capability.
Hands-on reproduction of recent indirect prompt injection findings against tool-calling agents, with a working harness to evaluate boundary controls in candidate architectures before they ship.
Translating an in-house insider-threat detection architecture into a deployable form: LSTM-based next-activity prediction over high-cardinality event streams, with operational tuning for analyst signal-to-noise.
Continuous benchmarking of open-weight model releases against a fixed evaluation suite for clients who cannot rely on hosted frontier APIs. Reproducibility and auditability are first-class requirements.
A short, intensive engagement to map what you have, what you need, and what is being asked of the AI layer that it cannot reasonably do. Output is a written technical brief, not a deck. Most projects worth doing become clearer here. A few stop here, on purpose.
Iterative engineering on the system itself: agents, memory, retrieval, evaluation, and the security boundaries around them. Code is yours, infrastructure is yours, evaluation harnesses are yours. We document so your team can own what we build.
Optional ongoing engagement to keep the system instrumented, evaluated, and evolving as models and infrastructure change. Or a clean handoff with the documentation, runbooks, and eval harnesses to keep it healthy in your hands.
Travis is an AI engineer with a decade of detection engineering and SIEM architecture experience across federal, SLED, financial, healthcare, and enterprise environments. He was a founding engineer at Deepwatch and currently serves as Principal Security Engineer at GuidePoint, where he leads AI/ML security research.
He works at the intersection of applied machine learning and operational engineering discipline: building agentic systems and ML infrastructure that deliver real value, with the boundary controls and observability needed to run them without surprises.
His writing on AI engineering and security appears in the Signal & Noise newsletter and on HuggingFace. Outside the work, he is an active amateur radio operator and emergency communications builder.
Most engagements start with a short, paid R&D sprint so both sides can decide if it's a fit and what's actually feasible. We take on a small number of projects at a time. If we are not the right fit, we will say so early and where possible point you somewhere better.
A short, focused engagement to read, reproduce, and evaluate a specific technique or research direction against your problem. Output is a written brief and a working prototype.
A scoped build with a clear deliverable. Discovery, design, implementation, handoff. Code, infrastructure, and evaluation harnesses are yours to keep.
Embedded with your team for a defined window. Building shoulder-to-shoulder, leaving capability behind in your engineers as much as in your code.
Ongoing technical advisory for engineering leadership shaping AI strategy, evaluating vendors, or making architectural calls with long-term consequences.
Tell us what you're working on. Most replies go out within a business day. If we're not the right fit, we will say so early and where possible point you somewhere better.