constantsense
Edge AI · measured, not claimed

Sub-millisecond inference
at the edge.

23 benchmarked ONNX models running on-device. Biosignal, audio, vision, text, industrial. No cloud call, no data leak — real numbers on NVIDIA GB10.

0.114 ms
p50 · hero model
1.45 MB
model size
83.14%
SST-2 accuracy
23
benchmarked models

Run it in your browser.

The actual 1.45 MB sentiment model, classifying your text. ONNX Runtime loads it client-side — no server, no API, no data leaves your machine. Latency measured on your hardware.

Click Classify to load the 1.45 MB model and run it locally (first run ≈ 2-4 s while the model downloads)

The hero model, measured.

Paper-replication target: NanoCNN+Linear INT8 student distilled from BERT. Size claim verified, latency beats published spec.

p50 latency · GB10 CPU
0.114ms
size1.45 MB
p50 / p95 / p990.114 / 0.121 / 0.141 ms
σ (jitter)0.023 ms
throughput (single core)8,600 req/s
accuracy · SST-2 dev83.14% (725 / 872)
teacher → student438 MB / 49.0 ms → 1.45 MB / 0.114 ms
compression / speedup302× / 430×
hardwareNVIDIA GB10 (Grace-Blackwell ARM), 5000 iters, 200 warmup

23 models. One gateway.

Biosignal, audio, vision, text, industrial. All running on commodity ARM CPU. No GPU required.

ModelDomainSizep50p95Throughput

Built for sovereignty.

Every inference on-device. Nothing leaves the machine. Standard ONNX — drop into your stack, point at your hardware, measure on your terms.

On-device

ARM / x86 / NVIDIA — ONNX runs anywhere. No cloud dependency. Airplane-mode ready.

Regulated-ready

Medical, financial, legal, industrial. Data never leaves the perimeter. Audit-trail compatible.

Measured

Every number on this page came from a live run. No marketing math. JSON on request.

Want the gateway on your hardware?

ConstantSense deploys on your servers, your edge devices, your racks. We bring the models — you keep your data.

admin@constantqj.com