Edge AI · measured, not claimed

Sub-millisecond inference
at the edge.

23 benchmarked ONNX models running on-device. Biosignal, audio, vision, text, industrial. No cloud call, no data leak — real numbers on NVIDIA GB10.

Try it live See the numbers

0.114 ms

p50 · hero model

1.45 MB

model size

83.14%

SST-2 accuracy

benchmarked models

Run it in your browser.

The actual 1.45 MB sentiment model, classifying your text. ONNX Runtime loads it client-side — no server, no API, no data leaves your machine. Latency measured on your hardware.

Click Classify to load the 1.45 MB model and run it locally (first run ≈ 2-4 s while the model downloads)

The hero model, measured.

Paper-replication target: NanoCNN+Linear INT8 student distilled from BERT. Size claim verified, latency beats published spec.

p50 latency · GB10 CPU

0.114ms

size	1.45 MB
p50 / p95 / p99	0.114 / 0.121 / 0.141 ms
σ (jitter)	0.023 ms
throughput (single core)	8,600 req/s
accuracy · SST-2 dev	83.14% (725 / 872)
teacher → student	438 MB / 49.0 ms → 1.45 MB / 0.114 ms
compression / speedup	302× / 430×
hardware	NVIDIA GB10 (Grace-Blackwell ARM), 5000 iters, 200 warmup

23 models. One gateway.

Biosignal, audio, vision, text, industrial. All running on commodity ARM CPU. No GPU required.

Model	Domain	Size	p50	p95	Throughput

Built for sovereignty.

Every inference on-device. Nothing leaves the machine. Standard ONNX — drop into your stack, point at your hardware, measure on your terms.

On-device

ARM / x86 / NVIDIA — ONNX runs anywhere. No cloud dependency. Airplane-mode ready.

Regulated-ready

Medical, financial, legal, industrial. Data never leaves the perimeter. Audit-trail compatible.

Measured

Every number on this page came from a live run. No marketing math. JSON on request.

Want the gateway on your hardware?

ConstantSense deploys on your servers, your edge devices, your racks. We bring the models — you keep your data.

admin@constantqj.com

Sub-millisecond inferenceat the edge.