Background Loading & Multi-Neuron
The biggest UX win for embeddable chat: start loading the model while the user is reading the page, so by the time they click the chat bubble, the AI is ready. Dendrite's public API is built around this pattern.
The problem
A naive chat widget creates its Neuron when the user opens the chat:
[user arrives on page] — 0s
[user browses, reads intro] — 10s
[user clicks "Chat" bubble] — 15s
[widget mounts]
↓ createNeuron()
↓ download model weights — 60s first visit, 3s cached
↓ init engine — 5s
[chat is finally usable] — 80s / 23sThe fix is to start loading the moment the page boots — hidden, in the background — and have the widget attach to an already-loading (or already-ready) Neuron when it mounts.
Three ways to wire it up
Every framework flavor of dendrite supports an external Neuron that the widget consumes instead of creating its own.
Vue — useNeuron at the app root
<!-- App.vue -->
<script setup lang="ts">
import { useNeuron, ChatModal } from '@agent-layer-zero/dendrite-vue'
import '@agent-layer-zero/soma/chat-components.css'
// Model starts loading immediately, in the background.
const neuronState = useNeuron({
modelId: 'gemma-2-2b-it-q4f16_1-MLC',
systemPrompt: 'You are a helpful assistant.',
})
</script>
<template>
<!-- User browses the page normally while the model loads -->
<MainContent />
<!-- Click the bubble → widget is ready (or very close to it) -->
<ChatModal :neuron-state="neuronState" />
</template>Prefer <NeuronProvider> if the Neuron is shared across many components:
<NeuronProvider :config="{ modelId: '...', systemPrompt: '...' }">
<App /> <!-- any <ChatWidget> / <ChatModal> inside auto-picks it up -->
</NeuronProvider>React — useNeuron at the app root
import { useNeuron, ChatModal } from '@agent-layer-zero/dendrite-react'
import '@agent-layer-zero/soma/chat-components.css'
export function App() {
const neuronState = useNeuron({
modelId: 'gemma-2-2b-it-q4f16_1-MLC',
systemPrompt: 'You are a helpful assistant.',
})
return (
<>
<MainContent />
<ChatModal neuronState={neuronState} />
</>
)
}Or the provider form:
<NeuronProvider config={{ modelId: '...', systemPrompt: '...' }}>
<App />
</NeuronProvider>Vanilla — createNeuronHandle + widget.attach()
<script type="module">
import {
createNeuronHandle,
} from '@agent-layer-zero/dendrite-ui'
import '@agent-layer-zero/dendrite-ui' // registers <alz-chat-widget>
import '@agent-layer-zero/soma/chat-components.css'
// At page boot — start loading in the background
const handle = createNeuronHandle({
modelId: 'gemma-2-2b-it-q4f16_1-MLC',
systemPrompt: 'You are a helpful assistant.',
})
// When the widget mounts, hand it the already-loading handle
customElements.whenDefined('alz-chat-widget').then(() => {
document.querySelector('alz-chat-widget').attach(handle)
})
</script>
<main>
<!-- normal page content here -->
</main>
<alz-chat-widget style="height: 500px; display: block;"></alz-chat-widget>Resolution order (how widgets pick a Neuron)
Every framework widget uses the same resolution order:
| Step | Vue / React | Vanilla |
|---|---|---|
| 1 | neuronState prop | widget.attach(handle) |
| 2 | <NeuronProvider> / provideNeuron() | — |
| 3 | Self-created from the config props | HTML attributes |
If step 1 or 2 provides a Neuron, the widget does not create its own (no duplicate engine, no wasted VRAM).
Multi-Neuron — coding model alongside chat model
Nothing special — create multiple Neurons and hand each to the right widget:
<script setup lang="ts">
import { useNeuron, ChatWidget } from '@agent-layer-zero/dendrite-vue'
const chatNeuron = useNeuron({
modelId: 'gemma-2-2b-it-q4f16_1-MLC',
systemPrompt: 'You are a friendly assistant.',
})
const codeNeuron = useNeuron({
modelId: 'Qwen3-4B-q4f16_1-MLC',
systemPrompt: 'You write clean, idiomatic code.',
})
</script>
<template>
<ChatWidget :neuron-state="chatNeuron" empty-title="Chat" />
<ChatWidget :neuron-state="codeNeuron" empty-title="Code review" />
</template>VRAM budget
Two 1.5B models ≈ ~2GB combined VRAM — fine on most desktop GPUs. Two 7B models ≈ 8–10GB, desktop-only. When in doubt, start with a smaller model and move up once you know the audience.
For more sophisticated routing — "use the coding model when the user asks about code, the chat model otherwise" — that's what @agent-layer-zero/axon (planned) is for. Until Axon lands, a simple computed/useMemo that picks a Neuron based on message intent is enough.
Persist across navigation — one Neuron for the whole app
Background loading solves the cold-start wait. Persisting the Neuron across route changes solves the warm-start rebuild — the second most common waste in real apps.
Default path when you don't use this pattern:
[click chat A] → widget mounts → createNeuron() → load weights → ready
[navigate away] → widget unmounts → neuron.destroy()
[click chat B] → widget mounts → createNeuron() → load weights → readyThe engine rebuild on every navigation is wasteful whenever the model is the same — weights re-read, worker re-spawned, handshake re-done.
Better: one Neuron for the whole app, swap persona on navigation.
<!-- App.vue or AppLayout.vue -->
<script setup lang="ts">
import { useNeuron, provideNeuron } from '@agent-layer-zero/dendrite-vue'
import { useSettingsStore } from '@/stores/settings'
const settings = useSettingsStore()
// One persistent Neuron for the whole app lifecycle
const neuronState = provideNeuron({
modelId: settings.modelId,
// No system prompt here — each chat route sets its own
})
</script>
<template>
<!-- Main layout — the Neuron persists across all children -->
<RouterView />
</template>Then in each chat route / instance:
<!-- ChatView.vue — any instance/persona route -->
<script setup lang="ts">
import { useProvidedNeuron } from '@agent-layer-zero/dendrite-vue'
import { watchEffect } from 'vue'
const props = defineProps<{ instance: ChatInstance; docs: PersonalityDoc[] }>()
const neuronState = useProvidedNeuron()
// On route change: swap the persona without rebuilding the engine.
watchEffect(() => {
const n = neuronState.neuron.value
if (!n) return
n.setSystemPrompt(props.instance.zeroShotPrompt ?? '')
n.setPersonalityDocs(props.docs)
n.clearHistory()
})
</script>
<template>
<ChatWidget :neuron-state="neuronState" />
</template>When does the engine need to actually rebuild?
Only when the model changes — a different model family, quantization, or model ID. For that, call neuron.setModel(id) (async):
async function switchModel(newModelId: string) {
await neuronState.neuron.value?.setModel(newModelId)
}Everything else — system prompt, personality docs, temperature, max tokens, history — is mutable on an already-loaded Neuron with no rebuild cost.
Navigation UX comparison
| Per-route Neuron (old) | Persistent Neuron (new) | |
|---|---|---|
| First chat visit | ~5s (cached) / ~60s (first visit) | same — model has to load once |
| Second chat visit (same model) | ~5s (cached engine rebuild) | instant (just persona swap) |
| User changes model in settings | ~5s | ~5s (setModel() invokes worker) |
| Going back and forth across personas | ~5s each time | instant each time |
This pattern works the same in React (wrap with <NeuronProvider> and call useProvidedNeuron() inside routes) and vanilla (one createNeuronHandle at page boot, widget.attach(handle) per mount).
This applies to every site, not just platforms
The persistent-Neuron pattern isn't specific to multi-user platforms like AgentLayerZero. Any site embedding a chat widget benefits:
- Marketing site with a chat bubble — one Neuron, whatever page you navigate to, the bubble is ready instantly.
- Docs site with an AI assistant — navigate between doc pages; the assistant keeps its engine warm and just adapts to the current page's context (via
setSystemPrompt). - Portfolio site with a persona chat — single-page or multi-page, one Neuron, zero rebuilds.
- Product with multi-persona — each persona is just a
setSystemPrompt+setPersonalityDocsaway.
Scaffold once at the site root. Every widget below benefits.
What happens on subsequent visits
The first visit downloads the model weights into IndexedDB (~1–4GB depending on model size, taking 30s–5min on normal broadband). Every subsequent visit reads from cache and boots the engine in ~5 seconds.
So the background-load pattern has two flavors of payoff:
- First visit: the model downloads while the user reads, and the chat bubble opens to a "still loading" screen that's much closer to done than if loading had started on click.
- Returning visits: the chat is essentially instant.
Ownership & cleanup
- A Neuron created via
useNeuron(Vue/React) orcreateNeuronHandle(vanilla) is owned by the caller. It's destroyed when the component unmounts / the handle'sdestroy()is called. - When a widget receives an external Neuron via
neuronStateorattach(), it does not destroy it on unmount. The outer scope owns it. This is what lets you swap Neurons and keep them alive for reattachment.
Sneak peek — Axon
@agent-layer-zero/axon (planned) builds on this foundation:
- Routing: picks which Neuron to use per message based on intent / tool calls
- Tools: gives Neurons the ability to call typed JS functions
- Memory: persistent + retrieval-augmented context
- BYOK: extends the routing layer to also route to OpenRouter/Anthropic/OpenAI when appropriate
Anything you build on useNeuron / neuronState / NeuronHandle today will compose with Axon when it arrives — no rewrite needed. Axon produces the same UseNeuronReturn / NeuronHandle interface, just with more behavior inside.
