optimization of the whole software stack with architecture/engine/kernel co-design
The startup claims a throughput of 3,000 tokens/s per request, a significant speed advantage in LLM inference compared to competitors like NVIDIA and Google Cloud.