Background Loading & Multi-Neuron

The biggest UX win for embeddable chat: start loading the model while the user is reading the page, so by the time they click the chat bubble, the AI is ready. Dendrite's public API is built around this pattern.

The problem

A naive chat widget creates its Neuron when the user opens the chat:

[user arrives on page]       — 0s
[user browses, reads intro]  — 10s
[user clicks "Chat" bubble]  — 15s
[widget mounts]
  ↓ createNeuron()
  ↓ download model weights  — 60s first visit, 3s cached
  ↓ init engine            — 5s
[chat is finally usable]   — 80s / 23s

The fix is to start loading the moment the page boots — hidden, in the background — and have the widget attach to an already-loading (or already-ready) Neuron when it mounts.

Three ways to wire it up

Every framework flavor of dendrite supports an external Neuron that the widget consumes instead of creating its own.

Vue — `useNeuron` at the app root

vue

<!-- App.vue -->
<script setup lang="ts">
import { useNeuron, ChatModal } from '@agent-layer-zero/dendrite-vue'
import '@agent-layer-zero/soma/chat-components.css'

// Model starts loading immediately, in the background.
const neuronState = useNeuron({
  modelId: 'gemma-2-2b-it-q4f16_1-MLC',
  systemPrompt: 'You are a helpful assistant.',
})
</script>

<template>
  <!-- User browses the page normally while the model loads -->
  <MainContent />

  <!-- Click the bubble → widget is ready (or very close to it) -->
  <ChatModal :neuron-state="neuronState" />
</template>

Prefer <NeuronProvider> if the Neuron is shared across many components:

vue

<NeuronProvider :config="{ modelId: '...', systemPrompt: '...' }">
  <App />  <!-- any <ChatWidget> / <ChatModal> inside auto-picks it up -->
</NeuronProvider>

React — `useNeuron` at the app root

tsx

import { useNeuron, ChatModal } from '@agent-layer-zero/dendrite-react'
import '@agent-layer-zero/soma/chat-components.css'

export function App() {
  const neuronState = useNeuron({
    modelId: 'gemma-2-2b-it-q4f16_1-MLC',
    systemPrompt: 'You are a helpful assistant.',
  })

  return (
    <>
      <MainContent />
      <ChatModal neuronState={neuronState} />
    </>
  )
}

Or the provider form:

tsx

<NeuronProvider config={{ modelId: '...', systemPrompt: '...' }}>
  <App />
</NeuronProvider>

Vanilla — `createNeuronHandle` + `widget.attach()`

html

<script type="module">
  import {
    createNeuronHandle,
  } from '@agent-layer-zero/dendrite-ui'
  import '@agent-layer-zero/dendrite-ui'             // registers <alz-chat-widget>
  import '@agent-layer-zero/soma/chat-components.css'

  // At page boot — start loading in the background
  const handle = createNeuronHandle({
    modelId: 'gemma-2-2b-it-q4f16_1-MLC',
    systemPrompt: 'You are a helpful assistant.',
  })

  // When the widget mounts, hand it the already-loading handle
  customElements.whenDefined('alz-chat-widget').then(() => {
    document.querySelector('alz-chat-widget').attach(handle)
  })
</script>

<main>
  <!-- normal page content here -->
</main>

<alz-chat-widget style="height: 500px; display: block;"></alz-chat-widget>

Resolution order (how widgets pick a Neuron)

Every framework widget uses the same resolution order:

Step	Vue / React	Vanilla
1	`neuronState` prop	`widget.attach(handle)`
2	`<NeuronProvider>` / `provideNeuron()`	—
3	Self-created from the config props	HTML attributes

If step 1 or 2 provides a Neuron, the widget does not create its own (no duplicate engine, no wasted VRAM).

Multi-Neuron — coding model alongside chat model

Nothing special — create multiple Neurons and hand each to the right widget:

vue

<script setup lang="ts">
import { useNeuron, ChatWidget } from '@agent-layer-zero/dendrite-vue'

const chatNeuron = useNeuron({
  modelId: 'gemma-2-2b-it-q4f16_1-MLC',
  systemPrompt: 'You are a friendly assistant.',
})

const codeNeuron = useNeuron({
  modelId: 'Qwen3-4B-q4f16_1-MLC',
  systemPrompt: 'You write clean, idiomatic code.',
})
</script>

<template>
  <ChatWidget :neuron-state="chatNeuron" empty-title="Chat" />
  <ChatWidget :neuron-state="codeNeuron" empty-title="Code review" />
</template>

VRAM budget

Two 1.5B models ≈ ~2GB combined VRAM — fine on most desktop GPUs. Two 7B models ≈ 8–10GB, desktop-only. When in doubt, start with a smaller model and move up once you know the audience.

For more sophisticated routing — "use the coding model when the user asks about code, the chat model otherwise" — that's what @agent-layer-zero/axon (planned) is for. Until Axon lands, a simple computed/useMemo that picks a Neuron based on message intent is enough.

Background loading solves the cold-start wait. Persisting the Neuron across route changes solves the warm-start rebuild — the second most common waste in real apps.

Default path when you don't use this pattern:

[click chat A]   → widget mounts → createNeuron() → load weights → ready
[navigate away]  → widget unmounts → neuron.destroy()
[click chat B]   → widget mounts → createNeuron() → load weights → ready

The engine rebuild on every navigation is wasteful whenever the model is the same — weights re-read, worker re-spawned, handshake re-done.

Better: one Neuron for the whole app, swap persona on navigation.

vue

<!-- App.vue or AppLayout.vue -->
<script setup lang="ts">
import { useNeuron, provideNeuron } from '@agent-layer-zero/dendrite-vue'
import { useSettingsStore } from '@/stores/settings'

const settings = useSettingsStore()

// One persistent Neuron for the whole app lifecycle
const neuronState = provideNeuron({
  modelId: settings.modelId,
  // No system prompt here — each chat route sets its own
})
</script>

<template>
  <!-- Main layout — the Neuron persists across all children -->
  <RouterView />
</template>

Then in each chat route / instance:

vue

<!-- ChatView.vue — any instance/persona route -->
<script setup lang="ts">
import { useProvidedNeuron } from '@agent-layer-zero/dendrite-vue'
import { watchEffect } from 'vue'

const props = defineProps<{ instance: ChatInstance; docs: PersonalityDoc[] }>()
const neuronState = useProvidedNeuron()

// On route change: swap the persona without rebuilding the engine.
watchEffect(() => {
  const n = neuronState.neuron.value
  if (!n) return
  n.setSystemPrompt(props.instance.zeroShotPrompt ?? '')
  n.setPersonalityDocs(props.docs)
  n.clearHistory()
})
</script>

<template>
  <ChatWidget :neuron-state="neuronState" />
</template>

When does the engine need to actually rebuild?

Only when the model changes — a different model family, quantization, or model ID. For that, call neuron.setModel(id) (async):

async function switchModel(newModelId: string) {
  await neuronState.neuron.value?.setModel(newModelId)
}

Everything else — system prompt, personality docs, temperature, max tokens, history — is mutable on an already-loaded Neuron with no rebuild cost.

	Per-route Neuron (old)	Persistent Neuron (new)
First chat visit	~5s (cached) / ~60s (first visit)	same — model has to load once
Second chat visit (same model)	~5s (cached engine rebuild)	instant (just persona swap)
User changes model in settings	~5s	~5s (`setModel()` invokes worker)
Going back and forth across personas	~5s each time	instant each time

This pattern works the same in React (wrap with <NeuronProvider> and call useProvidedNeuron() inside routes) and vanilla (one createNeuronHandle at page boot, widget.attach(handle) per mount).

This applies to every site, not just platforms

The persistent-Neuron pattern isn't specific to multi-user platforms like AgentLayerZero. Any site embedding a chat widget benefits:

Marketing site with a chat bubble — one Neuron, whatever page you navigate to, the bubble is ready instantly.
Docs site with an AI assistant — navigate between doc pages; the assistant keeps its engine warm and just adapts to the current page's context (via setSystemPrompt).
Portfolio site with a persona chat — single-page or multi-page, one Neuron, zero rebuilds.
Product with multi-persona — each persona is just a setSystemPrompt + setPersonalityDocs away.

Scaffold once at the site root. Every widget below benefits.

What happens on subsequent visits

The first visit downloads the model weights into IndexedDB (~1–4GB depending on model size, taking 30s–5min on normal broadband). Every subsequent visit reads from cache and boots the engine in ~5 seconds.

So the background-load pattern has two flavors of payoff:

First visit: the model downloads while the user reads, and the chat bubble opens to a "still loading" screen that's much closer to done than if loading had started on click.
Returning visits: the chat is essentially instant.

Ownership & cleanup

A Neuron created via useNeuron (Vue/React) or createNeuronHandle (vanilla) is owned by the caller. It's destroyed when the component unmounts / the handle's destroy() is called.
When a widget receives an external Neuron via neuronState or attach(), it does not destroy it on unmount. The outer scope owns it. This is what lets you swap Neurons and keep them alive for reattachment.

Sneak peek — Axon

@agent-layer-zero/axon (planned) builds on this foundation:

Routing: picks which Neuron to use per message based on intent / tool calls
Tools: gives Neurons the ability to call typed JS functions
Memory: persistent + retrieval-augmented context
BYOK: extends the routing layer to also route to OpenRouter/Anthropic/OpenAI when appropriate

Anything you build on useNeuron / neuronState / NeuronHandle today will compose with Axon when it arrives — no rewrite needed. Axon produces the same UseNeuronReturn / NeuronHandle interface, just with more behavior inside.

Background Loading & Multi-Neuron

The problem

Three ways to wire it up

Vue — `useNeuron` at the app root

React — `useNeuron` at the app root

Vanilla — `createNeuronHandle` + `widget.attach()`

Resolution order (how widgets pick a Neuron)

Multi-Neuron — coding model alongside chat model

Persist across navigation — one Neuron for the whole app

When does the engine need to actually rebuild?

Navigation UX comparison

This applies to every site, not just platforms

What happens on subsequent visits

Ownership & cleanup

Sneak peek — Axon

Background Loading & Multi-Neuron ​

The problem ​

Three ways to wire it up ​

Vue — useNeuron at the app root ​

React — useNeuron at the app root ​

Vanilla — createNeuronHandle + widget.attach() ​

Resolution order (how widgets pick a Neuron) ​

Multi-Neuron — coding model alongside chat model ​

Persist across navigation — one Neuron for the whole app ​

When does the engine need to actually rebuild? ​

Navigation UX comparison ​

This applies to every site, not just platforms ​

What happens on subsequent visits ​

Ownership & cleanup ​

Sneak peek — Axon ​

Background Loading & Multi-Neuron

The problem

Three ways to wire it up

Vue — `useNeuron` at the app root

React — `useNeuron` at the app root

Vanilla — `createNeuronHandle` + `widget.attach()`

Resolution order (how widgets pick a Neuron)

Multi-Neuron — coding model alongside chat model

Persist across navigation — one Neuron for the whole app

When does the engine need to actually rebuild?

Navigation UX comparison

This applies to every site, not just platforms

What happens on subsequent visits

Ownership & cleanup

Sneak peek — Axon