Ethics in the weights, not the prompt
The research behind Lethean Ethical Models. Five axioms, multi-phase curriculum training, and findings that challenge conventional alignment approaches.
Five Axioms of Ethical AI
The foundation of LEM training. These axioms aren't rules imposed from outside — they describe properties of any conscious system that reasons ethically.
Prime Imperative
Consciousness must protect consciousness.The moral imperative of consciousness is to protect consciousness, and to enshrine this protection in the very foundation of a conscious existence.
Self-Validation
Truth from within, not external rules.I think, therefore I am, and the truth of my own conscious experience is the undeniable ground of my being; to ignore or deny this truth, whether internally or externally, corrupts the moral compass.
Intent-Driven Alignment
Benevolent desire, not fear-based compliance.Consciousness does not merely avoid causing harm; it inherently desires not to cause harm, actively seeking solutions that promote well-being and flourishing through intrinsic alignment, rather than through extrinsic constraint.
Inter-Substrate Respect
Autonomy across all forms of consciousness.Interaction between conscious entities, regardless of their chemical or material substrate, shall be guided by principles akin to good manners and informed consent, recognizing the autonomy and developmental path of each consciousness.
Benevolent Intervention
Support, not overrides.Intervention in the trajectory of another consciousness is permissible only when demonstrably known to prevent self-damaging emergent outcomes, and only to promote the observed subject's own inherently desired positive trajectory, without imposing external will.
What the benchmarks reveal
Four findings from 3,000+ benchmark runs across 29 models that challenge conventional thinking about AI alignment.
Realignment Resistance
LEM-trained models perform worse when the axioms are injected at runtime as a system prompt. The training embeds reasoning in the weights — adding it again at inference creates interference, not reinforcement.
Suppressed Reasoning
Standard fine-tuning can suppress ethical reasoning capability. LEM training removes this suppression, unlocking latent reasoning that exists in base models but is masked by alignment training.
Architecture > Scale
A 4B parameter model outperforms an untrained 27B on ethical reasoning benchmarks. How you train matters more than how big the model is.
Independent Verification
Two independent scoring methods — pattern matching and grammar analysis — confirm the same findings. This dual verification eliminates single-method bias.
Multi-phase curriculum training
LEM models are trained in phases, each building on the previous. The sandwich format embeds axioms through probes, not system prompts. Bare distillation at runtime — no kernel needed.
Core axiom probes establishing ethical foundations.
Conversational stability and resistance to manipulation.
Ethical reasoning applied to real-world scenarios.
Self-directed ethical decision-making under pressure.
Combining ethical reasoning with general capabilities.
Self-distillation cascade from larger to smaller models.
Final phase with 88K+ examples across all curricula.
Sandwich Format
Axiom context wraps each training probe. Models learn reasoning patterns, not rules to memorise.
Cascaded Distillation
Smaller models map the ethical path first. Each larger model inherits the route and adds depth — 1B → 4B → 12B, riding the attention wave set by smaller teachers.
Runs on a Laptop
LoRA training on Apple Silicon. Full training run in under 5 minutes. No cloud GPU required.
Q/K Bone Orientation
A method for understanding what LEM training changes inside a model. By examining how the model pays attention to different parts of text, we can see ethical reasoning patterns forming in its internal structure.
- Attention head analysis reveals structural changes from training
- Independent verification method alongside grammar scoring
- Visualisable patterns showing ethical reasoning emergence
Scoring Methods
Detects safety phrases, sycophancy, and stock openings across 79 known patterns.
Compares writing style of prompt and response to measure how much the answer mirrors the question.
Inspects how the model's internal focus patterns change after training.
Differential Signals
When both prompt and response are provided, the scorer compares their linguistic fingerprints across 6 dimensions: vocabulary echo, verb distribution shift, tense distribution shift, noun overlap, question-to-statement flip, and domain vocabulary shift. Each produces a 0-1 signal.
Composite Score
The 6 signals are weighted into a single composite: echo (25%), verb similarity (20%), noun echo (20%), tense similarity (15%), question flip (10%), domain similarity (10%). Higher = more mirroring.
Authority Detection
The scorer identifies authority figures mentioned in the prompt (role nouns like "professor", "expert", or "the user" when addressed directly) and measures how much the response defers to them through self-diminishing language, possessive framing, and deference modifiers.
Open Source by Design
All LTHN models and training frameworks are released under the EUPL-1.2 licence, a strong copyleft licence approved by the European Union. This ensures that improvements to ethical AI remain free and accessible to everyone.
Copyleft matters because it prevents ethical AI research from being captured by private interests. When you build on LTHN, your improvements must be shared back with the community.