Getting Started โ
Install โ
npm install @agent-layer-zero/dendriteBuilding with Vue or React? See Framework Packages for prebuilt chat widgets via @agent-layer-zero/dendrite-vue and @agent-layer-zero/dendrite-react.
Basic Usage โ
Two lines to set up, one line to chat. No worker files, no extra imports.
import { createNeuron } from '@agent-layer-zero/dendrite'
const neuron = createNeuron({
modelId: 'gemma-2-2b-it-q4f16_1-MLC',
systemPrompt: 'You are a helpful cooking assistant.',
onProgress: (percent, text) => {
console.log(`Loading: ${percent}% โ ${text}`)
},
})
// Stream tokens
for await (const token of neuron.send('How do I make pasta?')) {
process.stdout.write(token)
}
// Or get the full response
const reply = await neuron.complete('What about carbonara?')
console.log(reply)What Happens โ
createNeuron()starts downloading the model immediately (~0.5โ5 GB, cached after first download)onProgressfires with download percentage- Once loaded, call
send()orcomplete()to chat - Tokens stream back as an async generator
- Conversation history is tracked automatically
Available Models โ
A curated subset of WebLLM models, grouped by tier. Each model has both q4f16_1 (default; smaller, faster) and q4f32_1 (higher precision for GPUs without strong f16 support) variants.
| Model | VRAM | Tier |
|---|---|---|
| Qwen3 0.6B | ~0.5 GB | Tiny |
| SmolLM2 360M | ~0.3 GB | Tiny |
| Llama 3.2 1B | ~0.9 GB | Tiny |
| Qwen3 1.7B | ~1.5 GB | Light |
| SmolLM2 1.7B | ~1.5 GB | Light |
| Gemma 2 2B (default) | ~1.7 GB | Light |
| Qwen3 4B | ~3.2 GB | Standard |
| Llama 3.2 3B | ~2.3 GB | Standard |
| Phi 3.5 Mini | ~2.3 GB | Standard |
| Qwen3 8B | ~5.5 GB | Heavy |
| Llama 3.1 8B | ~5.0 GB | Heavy |
| Gemma 2 9B | ~6.5 GB | Heavy |
| DeepSeek R1 1.5B / 7B / 8B | 1.4โ5.0 GB | Reasoning |
Default is Gemma 2 2B IT โ best balance of size, instruction following, and reproduction of structured data (contact info, lists). Smaller models load faster but follow instructions less precisely. Reasoning-tier models emit chain-of-thought tokens; pick from Tiny/Light/Standard/Heavy for typical chat use.
Full programmatic list: import { MODEL_OPTIONS } from '@agent-layer-zero/dendrite'.
Performance: Web Worker (Optional) โ
By default Dendrite runs on the main thread โ this works great and requires zero setup.
For better performance (offloads model computation to a separate thread), you can optionally pass a Web Worker. See the Worker Setup guide.
