Introducing Wylde.
Wylde is a local-first AI platform that lives on your desktop. It bundles inference, retrieval, training, voice, automation, and graph storage into a single coordinated stack — about twenty native services, one desktop app, and zero requirement to phone home. Today is the first public preview.
This post is the why behind it. If you'd rather just install it, the getting-started guide takes about five minutes.
The itch
Most AI tooling today asks you to choose: a chat box that talks to someone else's GPU, or a pile of half-integrated CLIs you wire together by hand. The first option is fast to start and slow to own — your data and prompts live in someone else's logs, and the moment you want a custom workflow you're stitching extensions or plugins together. The second is the opposite — total control, total assembly cost.
We wanted a third option: a real platform that runs on the machine in front of you, with the same shape as the cloud stacks (a gateway, a registry, services, workflows, observability) but with all of it loopback by default and inspectable end-to-end. Wylde is that.
What Wylde is
Under the hood, Wylde is a small distributed system on a single machine. Each service is a native Windows process with a Windows named pipe at \\.\pipe\wylde-<name>. They speak msgpack with a u32 length prefix; HTTP loopback is always available as a fallback. The hot-path latency is roughly 0.1 ms over the pipe versus 1.5 ms for HTTP loopback — fifteen times faster, which adds up across multi-step orchestrator workflows.
The platform is grouped into six layers:
- Core — gateway (
security-api), registry (tool-registry), launcher (wylde-launcher), sandbox (tool-runner), system metrics (wylde-sysmon), device gate. - Data — hybrid RAG with HyDE and graph-enhanced retrieval; voice (NPU-accelerated wake word + STT, Kokoro TTS); a dataset captioner with Florence-2 and Qwen2.5-VL backends.
- Orchestration — a graph engine with checkpointing, SSE streaming, gates, budgets, and a model registry; the
wylde-trainerfine-tuning service; thewylde-improveself-improvement loop. - Network — n8n adapter, web crawler, WireGuard-based remote access.
- Graph — a Neo4j-compatible store fronted by a named pipe; Memgraph drops in cleanly when you want it.
- External access — opt-in nginx + device-gate flow over Tailscale, with per-device approval.
The Fletch desktop app — Tauri 2 plus Svelte 5 — drives all of it. It opens on a system overview with health for every service, then gives you tabs for Tools, Workflows, Studio (training), Devices, Voice, and Graph. None of those tabs are special; they all talk to the same registry that everything else does, so anything you can do in the GUI you can also do from a script.
Defaults are the design
The fastest way to describe Wylde is to walk through what it doesn't do by default. Out of the box:
- Every service binds to
127.0.0.1. Nothing is on the network. - There are no accounts, no telemetry, no download tracking.
- Voice runs on the Intel NPU at ~5 W. Wake word, STT, TTS — all local.
- Discovery is mDNS — no Consul required for the standalone install.
- Inference goes to Ollama on
127.0.0.1:11434. - Tool calls go through the sandboxed runner; arbitrary code does not run in the gateway.
None of this is anti-cloud. It just means cloud is opt-in. If you want remote access from your phone, the Devices tab will install Tailscale and bring up a small nginx + device-gate stack — and then your phone has to be approved before it can talk to anything. If you want to call OpenAI's API from a workflow, drop a tool that does it.
Agent Orchestra
One of the things shipped this month is Agent Orchestra, a 15-stage multi-agent coding workflow built on the orchestrator. The shape is straightforward — lessons lookup, spec preview, context gather, planner, plan gate, architect, test writer, test gate, coder, debugger, docs checker, adversarial critic, critic gate, experiential logger, summariser — with a couple of bounded delegation loops (coder ↔ test-writer, debugger ↔ coder with forced reflection).
The interesting bit is the experiential learning store. The final stage writes a JSON lesson into the same memories table the RAG service reads at the start of the next run. No new database; no new service. The system gets better at the work it does most often, automatically, on your own data.
Self-improvement
The other recent piece is wylde-improve, a service that watches every interaction the platform handles, scores them, mines the patterns, and proposes changes — better prompt variants for individual stages, new edges and node merges in the knowledge graph, and (when enough samples accumulate) full LoRA training runs against wylde-trainer. Approved proposals are applied automatically by a background daemon. Skill variants are A/B-tested with an explore_rate of 0.3; promotion happens when the candidate beats the baseline by more than 0.05 over the configured minimum.
The point isn't to build an autonomous agent that wanders off and rewrites itself. It's that you have a knob, and the knob is yours.
What's next
The roadmap is short and honest. Macros and Linux builds are next. A handful of services still need their first round of real-world testing under load. The trainer's eval lab gets a comparative-runs view in the next minor release. And the NVMe attention offload experiment — Phase 1 of which landed earlier this month — has a Phase 2 with graph integration sitting on the bench.
If you've been wanting an AI environment that lives where your work lives — your machine, your hardware, your control — give Wylde a try.