Costs of Running AI: Tokens, GPUs and Inference
AI is not free to run, and the costs work differently from ordinary software. Understanding the main cost drivers helps you budget and avoid surprises.
This article demystifies tokens, GPUs and inference.
The Main Cost Drivers
- Tokens: cloud models charge per chunk of text in and out — longer prompts cost more.
- GPUs: the specialised chips that run models are expensive to rent or buy.
- Inference: every answer the model produces has a cost, which adds up with volume.
Keeping Costs Down
Sending only the necessary text, caching repeated answers, and using a smaller model where it is good enough all reduce spend without hurting quality much.
Budgeting Sensibly
- Estimate volume per day or month.
- Pilot and measure real cost per task.
- Set usage limits and alerts to avoid bill shock.
| Driver | Charged on | How to control |
|---|---|---|
| Tokens | Text volume | Shorter prompts |
| Inference | Each request | Cache, smaller model |
| GPU time | Hours used | Right-size hardware |
If you need a hand with any of this, your Progressive Robot delivery team is ready to help. Raise a ticket from the Support area of your client portal or speak to your account manager and we will guide you through the next steps.