v4.0.3  ·  Apache 2.0  ·  532 Tests Pass

ATLAS

Active-inference Training with Learned Adaptive Stigmergy

A Rust-native AGI training framework fusing stigmergic memory, n-morphic evolution, and GPU-accelerated inference.

532
Tests · 0 Failures
19.9
tok/s · OLMo-3-7B BF16
21
Crates · Pure Rust
v4.0.3
Apache 2.0

What makes ATLAS different

Three interlocking systems that work together to enable genuine adaptive intelligence — not fine-tuning, not prompting, but evolutionary cognition in silicon.

🏛️

Stigmergic Memory Palace

Pheromone-guided navigation through semantic knowledge graphs. CAS decay calibration with 4 adaptive regimes. Cross-session warm-start: memories persist and strengthen with use, fade with neglect — just like biological memory.

Powered by GraphPalace
🧬

N-Morphic Evolution

Champagnat–Méléard n-morphic framework with k-parallel phenotypic exploration. Lotka–Volterra competition between cognitive branches. InvasionFitnessScorer gates survival. O(1/√T) convergence guarantee on convex landscapes.

Champagnat 2006 · Méléard 2011

GPU Inference Engine

Pure Rust CUDA kernels — no Python, no ONNX, no dependency hell. Supports SmolLM2-135M through OLMo-3-7B with SWA attention, YaRN context extension, and RoPE scaling. INT8 quantization on the roadmap for 10× improvement.

sm_80 · A100-SXM4 · 19.9 tok/s OLMo-3-7B BF16 (v4.0.3)

Rigorous theory, not heuristics

Every algorithmic choice in ATLAS is grounded in peer-reviewed mathematics — from stochastic process theory to active inference.

Champagnat n-Morphic Framework (v4.0.0)
  • InvasionFitnessScorer — f(y) = success − cost − Σ cos_sim · n̄ (Champagnat–Méléard trait selection)
  • CanonicalPheromoneUpdate — Δρ = ½·μ·σ²·n̄·∂₁s (pheromone as evolutionary gradient)
  • BarBovier2017Constraints — Stability gate based on 2017 AAP escape rates
  • CognitiveBranching — n-morphic OODA bifurcation in atlas-astra
  • HJConcentrationPrior — Hopf–Cole T_eff(s) = T₀/(1 + γs) in atlas-trm
Δρ = ½·μ·σ²·n̄·∂₁s f(y) = success − cost − Σ cos_sim·n̄ T_eff(s) = T₀ / (1 + γ·s)
BF16 GPU Inference Path (v4.0.2 → v4.0.3)
  • W16A32 Pattern — Weights in BF16 (14 GB VRAM), activations in FP32. Lossless: BF16 = upper 16 bits of FP32 bit pattern
  • sgemv_bf16_kernel — One-warp-per-row GEMV for N=1 decode. Fixed 32× waste in prior tiled GEMM path
  • GpuBufBf16 + GpuBufKind — Discriminated union F32/BF16 in atlas-tensor. HashSet tracks BF16 tensors across shards
  • Result — OLMo-3-7B-Think: 4.1 → 19.9 tok/s (4.8× speedup). 224/224 BF16 matrices confirmed
W16A32: weights BF16 · activations f32 VRAM: ~14 GB (vs ~28 GB f32) Throughput: 19.9 tok/s (+4.8×) Tests: 532/532 passing ✓
OLMo-3-7B: SWA + YaRN (v4.0.1)
  • Sliding Window Attention — 24 sliding layers (window=4096) + 8 full-attention layers, banded mask via NEG_INFINITY
  • YaRN Context Extension — 3-band frequency decomposition (low/mid/high). Factor=8, attn_scale_factor=1.2079
  • Auto-Config Patching — patch_config_from_hf_json() reads layer_types, sliding_window, rope_scaling at load time
  • Logit Sanity — Spread: 16.803 ✓ · Max prob: 0.96% ✓ · 10/10 unique tokens ✓
Config: 32 layers (24×SWA + 8×full) YaRN: factor=8, attn_factor=1.2079 Load: 103s · Inference: 4.1 tok/s (pre-BF16) Test: gpu_inference_olmo3_quality_sanity ✓

Modular Crate Architecture

Each cognitive function lives in its own crate with clean interfaces. Composable, testable, and independently deployable.

atlas-model
Inference Engine
Transformer runtime with Sliding Window Attention, YaRN RoPE scaling, and multi-architecture support (Llama, OLMo, SmolLM2).
atlas-palace
Stigmergic Store
Pheromone memory with CAS decay calibration, session IDs, PalaceBackend trait, and cross-session warm-start.
atlas-corpus
Training Engine
InvasionFitnessScorer, StigmergicSampler, DeepSupervisionTrainer with N_sup latent carry and loss tracing.
atlas-astra
OODA Loop Engine
Adaptive OODA feedback with explore_ratio [0.1, 0.9], CognitiveBranching for n-morphic phenotype bifurcation.
atlas-trm
Thought Recursion
HJConcentrationPrior implementing Hopf–Cole temperature concentration. Recursive thought tree with configurable depth.
atlas-safety
Safety Constitution
Tractable Horn-clause safety rules — 8 principles across 4 domains. Declarative, auditable, formally verifiable.
atlas-api
HTTP Server
OpenAI-compatible REST + SSE streaming. 40 tests. Drop-in replacement for OpenAI API endpoints. Zero Python.
atlas-mcp
MCP Server
28 tools via JSON-RPC 2.0 on TCP :8765. McpConnectionPool (max 5, 5-min idle). Integrates with Claude & LangChain.

GPU Benchmark Results

Measured on NVIDIA A100-SXM4-40GB with CUDA 13.0, sm_80 kernels. FP32 weights, 30-token generation runs, release build.

Inference Throughput
Hardware: NVIDIA A100-SXM4-40GB  ·  CUDA: 13.0  ·  Arch: sm_80  ·  Precision: FP32 / BF16*
Model Parameters Throughput VRAM Load Time
SmolLM2-135M 135M
37.7 tok/s
~0.5 GB 2.0s
SmolLM2-360M 360M
25.4 tok/s
~1.4 GB 4.4s
SmolLM2-1.7B 1.7B
12.6 tok/s
~6.5 GB 22.5s
OLMo-3-7B-Think *BF16 7B
19.9 tok/s
~14 GB 103s

* OLMo-3-7B runs W16A32 BF16 GPU path (v4.0.2/v4.0.3) — weights BF16 (14GB VRAM), activations FP32. 4.8× speedup vs prior FP32 CPU path.

Release Roadmap

ATLAS evolves continuously. Major versions represent theoretical milestones, not just feature additions.

v4.0.3
λ Exp Decay Fix + Competition ReLU ✓ RELEASED
Sprint 0 bug fixes (Issue #11 closed): CanonicalPheromoneUpdate λ clamped via sigmoid to prevent negative decay; InvasionFitnessScorer τ=0.2 ReLU threshold for competition term. 532/532 tests pass.
v4.0.2
BF16 GPU Inference Path ✓ RELEASED
W16A32 BF16 path for OLMo-3-7B: weights in BF16 (14GB VRAM), activations in FP32. sgemv_bf16_kernel. 4.1 → 19.9 tok/s (4.8× speedup). 224/224 BF16 matrices confirmed.
v4.0.1
OLMo-3-7B · SWA + YaRN ✓ RELEASED
Full OLMo-3-7B-Think inference with Sliding Window Attention and YaRN context scaling. 528 tests pass. GPU sanity verified on A100.
v4.0.0
Champagnat N-Morphic Framework ✓ RELEASED
5-part P1–P5 implementation: InvasionFitnessScorer, CanonicalPheromoneUpdate, BarBovier2017Constraints, CognitiveBranching, HJConcentrationPrior.
v3.0.0
GPU Kernels + OpenAI API ✓ RELEASED
Pure Rust CUDA kernels (rmsnorm, rope, silu_mul), GpuVec activation buffers, atlas-api OpenAI-compatible server with SSE streaming.
v4.1.x
Sprint 1 — v5 RFC ▶ STARTING NOW
Four proposals from Issue #10 (v5 RFC) ready to begin: P1 PolymorphicTrainer (LoRA adapter morphic switching) · P2 SingularityDetector (eigenvalue singularity gate) · P3 PunctuatedCurriculum (pheromone-triggered epoch transitions) · P9 XSdcIsaSpec (ISA spec for RISC-V SDP core — patent-critical, deadline Dec 6 2026).
v4.1.0
INT8 Quantization (Sprint 0.5) NEXT
INT8 weight quantization for Linear layers. Target: ~30+ tok/s on OLMo-3-7B (current HBM2e utilization 6% — theoretical ceiling 69 tok/s). GpuMatrix quantized dispatch path.
v5.0.0
ATLAS SDP Hardware Bridge PLANNED
Interface layer for the Stigmergic Dynamical Processor (SDP) FPGA prototype. k=4 parallel phenotypic streams on AMD Versal VCK190. Updated die estimate: ~1.40 mm² at TSMC 7nm (Mamba-3 d_state N=32 saves 0.3 mm² vs prior N=64). Provisional Patent #63/932,720 — conversion deadline Dec 6 2026.

Start building with ATLAS

Pure Rust. No Python. No ONNX. No vendor lock-in. Clone the repo, run the tests, and explore the memory palace.

Star on GitHub View Releases
git clone https://github.com/web3guru888/ATLAS && cd ATLAS && cargo test