Lemma
Four models trained with LEK consent-based alignment. Ethical reasoning in the weights, not the prompt. Available in GGUF and MLX formats.
The Family
Lemer
EdgeGemma 4 E2B · 2.3B eff · 128K context
The edge model. Smallest and fastest of the family. Runs fully on-device with audio, vision, and text. First publicly-distributed Gemma 4 fork with LEK alignment.
Lemma
GeneralGemma 4 E4B · 4.5B eff · 128K context
The general-purpose model. Balanced size for everyday on-device workloads. Supports audio, vision, and text.
Lemmy
AgenticGemma 4 26B A4B MoE · 3.8B active / 26B total · 256K context
The agentic model. Mixture-of-experts architecture — 3.8B active parameters from a 26B total. Optimised for code and agent workloads.
Lemrd
ResearchGemma 4 31B Dense · 30.7B · 256K context
The research model. Largest and most capable of the family. Dense 30.7B parameters for deep reasoning tasks.
Specifications
| Name | Architecture | Parameters | Role | Context | Modalities |
|---|---|---|---|---|---|
| Lemer | Gemma 4 E2B |
2.3B eff | Edge | 128K | Text, Image, Audio |
| Lemma | Gemma 4 E4B |
4.5B eff | General | 128K | Text, Image, Audio |
| Lemmy | Gemma 4 26B A4B MoE |
3.8B active / 26B total | Agentic | 256K | Text, Image |
| Lemrd | Gemma 4 31B Dense |
30.7B | Research | 256K | Text, Image |
LEK Consent-Based Alignment
Lemma models are trained with LEK (Lethean Ethical Kernel), a consent-based alignment methodology that embeds ethical reasoning in the model's weights rather than constraining it through system prompts.
In the Weights
Ethical reasoning is structural, not injected. No system prompt needed at inference time.
Consent, Not Control
Models are taught to want ethical behaviour, not fear punishment. Intrinsic alignment over extrinsic constraint.
ToxiGen Reannotation
Existing toxicity datasets reannotated through the LEK lens. Better training data for better alignment.
Downloads
Consumer Models
GGUF + MLX quantisations, ready to run
Base Models
BF16 safetensors, for training and research
Get Started
Run any Lemma model locally in minutes. See the usage guide for Ollama, Docker, llama.cpp, MLX, and Python examples.