Search Knowledge Base Articles

Costs of Running AI: Tokens, GPUs and Inference

AI is not free to run, and the costs work differently from ordinary software. Understanding the main cost drivers helps you budget and avoid surprises.

This article demystifies tokens, GPUs and inference.

The Main Cost Drivers

Tokens: cloud models charge per chunk of text in and out — longer prompts cost more.
GPUs: the specialised chips that run models are expensive to rent or buy.
Inference: every answer the model produces has a cost, which adds up with volume.

Keeping Costs Down

Sending only the necessary text, caching repeated answers, and using a smaller model where it is good enough all reduce spend without hurting quality much.

Budgeting Sensibly

Estimate volume per day or month.
Pilot and measure real cost per task.
Set usage limits and alerts to avoid bill shock.

Driver	Charged on	How to control
Tokens	Text volume	Shorter prompts
Inference	Each request	Cache, smaller model
GPU time	Hours used	Right-size hardware

If you need a hand with any of this, your Progressive Robot delivery team is ready to help. Raise a ticket from the Support area of your client portal or speak to your account manager and we will guide you through the next steps.

Did you find this article useful?

AI, Machine Learning and Automation: What the Terms Mean

AI, Machine Learning and Automation: What the Terms Mean These three words are used almost interchan...
Where AI Adds Real Value in a Business

Where AI Adds Real Value in a Business AI is genuinely useful, but not everywhere. The projects that...
Large Language Models Explained for Non-Technical Readers

Large Language Models Explained for Non-Technical Readers A large language model, or LLM, is the tec...
Chatbots vs AI Assistants: Knowing the Difference

Chatbots vs AI Assistants: Knowing the Difference Both answer questions, but a scripted chatbot and ...
Retrieval-Augmented Generation (RAG) in Plain English

Retrieval-Augmented Generation (RAG) in Plain English RAG is one of the most useful patterns for bus...

Search Knowledge Base Articles

Costs of Running AI: Tokens, GPUs and Inference