Sub-millisecond inference
at the edge.
23 benchmarked ONNX models running on-device. Biosignal, audio, vision, text, industrial. No cloud call, no data leak — real numbers on NVIDIA GB10.
Run it in your browser.
The actual 1.45 MB sentiment model, classifying your text. ONNX Runtime loads it client-side — no server, no API, no data leaves your machine. Latency measured on your hardware.
The hero model, measured.
Paper-replication target: NanoCNN+Linear INT8 student distilled from BERT. Size claim verified, latency beats published spec.
| size | 1.45 MB |
| p50 / p95 / p99 | 0.114 / 0.121 / 0.141 ms |
| σ (jitter) | 0.023 ms |
| throughput (single core) | 8,600 req/s |
| accuracy · SST-2 dev | 83.14% (725 / 872) |
| teacher → student | 438 MB / 49.0 ms → 1.45 MB / 0.114 ms |
| compression / speedup | 302× / 430× |
| hardware | NVIDIA GB10 (Grace-Blackwell ARM), 5000 iters, 200 warmup |
23 models. One gateway.
Biosignal, audio, vision, text, industrial. All running on commodity ARM CPU. No GPU required.
| Model | Domain | Size | p50 | p95 | Throughput |
|---|
Built for sovereignty.
Every inference on-device. Nothing leaves the machine. Standard ONNX — drop into your stack, point at your hardware, measure on your terms.
On-device
ARM / x86 / NVIDIA — ONNX runs anywhere. No cloud dependency. Airplane-mode ready.
Regulated-ready
Medical, financial, legal, industrial. Data never leaves the perimeter. Audit-trail compatible.
Measured
Every number on this page came from a live run. No marketing math. JSON on request.
Want the gateway on your hardware?
ConstantSense deploys on your servers, your edge devices, your racks. We bring the models — you keep your data.
admin@constantqj.com