Real-time LLM Inference on Standard GPUs: 3k tokens/s per request
entityReal-time LLM Inference on Standard GPUs: 3k tokens/s per requestThe startup claims a throughput of 3,000 tokens/s per request, a significant speed advantage in LLM inference compared to competitors like NVIDIA and Google Cloud.