...

How to Calculate the Real Cost of Cloud Infrastructure: VMs, Traffic, Disks, Backups, IPs, and Support

Martin Klein

Reading time 1 minute

A team budgets for two VMs, selects a region in a calculator, and gets a reasonable monthly estimate. After the first billing period, the invoice includes disks, snapshots, outbound traffic, public IPs, a load balancer, logs, and support. The problem is not the calculator: it calculates based on the assumptions entered. The problem is that the infrastructure was estimated as a set of virtual machines rather than a system of connected resources.

The real cost of cloud infrastructure is monthly TCO (Total Cost Ownership): compute, storage, backups, network, monitoring, support, and a management buffer. For a business, it is more useful to calculate not the price of a single VM, but the full cost model: first list all cost items, then test them against small and mid-sized scenarios, and after launch monitor deviations.

The key calculation path:

  • identify all cost items around the VM;
  • combine them into a monthly TCO formula;
  • check hidden costs and overspending scenarios.

If you calculate only VMs, the model will be incomplete from the start. That is why the first step is to separate the price of a virtual machine from the cost of the entire infrastructure.

What Makes Up the Cost of Cloud Infrastructure

A VM creates a paid environment around itself: disks, network, public addresses, backups, logs, access rules, and management services. In this context, a VM is more of a node in an infrastructure diagram than a standalone server.

To avoid missing important items, it is useful to group TCO components by how they appear in the infrastructure. The first group includes resources directly attached to the VM and its data: compute, disks, backups, outbound traffic, and public IPs:

ComponentWhat it includesWhat the price depends onCommon mistake
VM / compute resourcesVM type, CPU, RAM, region, OS, operating hoursConfiguration, 24/7 or scheduled operation, regionCalculating only the monthly VM price without considering operating mode and region
Disks / storageSystem disks and data disksCapacity, disk type, performance tier if billed separatelyIncluding only the system disk
Snapshots and backupsBackups of VMs and disksBackup frequency, retention period, volume of changesIgnoring the cumulative growth of backups
Outbound trafficData transfer out of the cloudVolume, direction, destination zone or regionTreating all traffic as free
Public IPDedicated external IP addressesNumber of addresses, attachment or idle stateForgetting IPs for test and backup environments

These items usually appear early in the calculation because they are close to the VM itself. A team can see the number of servers, disk capacity, backup policy, public access, and expected traffic in the architecture diagram. However, this does not mean the estimate is complete.

The second group includes operational and shared services. They may not look like part of a single virtual machine, but they still affect the monthly invoice once the infrastructure is running. These costs often appear later: when access becomes more complex, logging is expanded, support requirements become formal, or several projects start using the same shared services.

This is why infrastructure cost should be checked not only as a technical configuration, but also as an operating model. The question is not only “what resources are deployed?”, but also “how will they be accessed, monitored, supported, and allocated across teams or projects?”

ComponentWhat it includes What the price depends on Common mistake 
Load balancer / NAT / VPN / Bastion Network access and routing services Number of services, operating hours, data processing volume Treating the network only as “connectivity” 
Monitoring and logs Metrics, logs, alerts, event storage Log volume, retention period, level of detail Treating observability as a free feature 
Support Support plan, SLA, support channels including full-time employeesSLA level, system criticality, contract terms Not including support in the B2B budget, including full-time employees engaged in support (SysOps, DevOps).
Buffer Reserve for load growth and unexpected volumes Traffic volatility, seasonality, product changes Planning the budget without a buffer 
Shared costs Security, network, monitoring, shared services Number of projects, cost allocation model Leaving shared services “outside the calculation” 

The takeaway: a significant share of the invoice may come not from VMs, but from variable, cumulative, and operational items — outbound traffic, backups, logs, public IPs, network access services, support, and shared services. Once both groups are mapped, they can be turned into a monthly calculation model.

TCO Formula: How to Calculate Monthly Cost Step by Step

The TCO formula is the sum of cost items, but its accuracy depends on the initial assumptions: operating mode, region, data volume, traffic, backup retention period, and support level.

Basic monthly model:

TCO/month = compute resources + disks/storage + backups + traffic + network services + public IPs + monitoring/logs + support + buffer.

Before inserting prices, define the calculation period, usually one month, and describe how the infrastructure will operate: 24/7, during business hours, during seasonal peaks, or only for temporary test environments. You also need to choose the region and payment model, estimate expected volumes of data, traffic, logs, and backups, and define SLA and support requirements.

After that, the calculation can be built step by step:

  1. Define the operating scenario: which environments run continuously, which are switched on by schedule, and where peaks are possible.
  2. Calculate VMs by number, type, operating hours, region, and operating system.
  3. Add system disks and data disks: capacity, type, performance requirements.
  4. Calculate backups based on backup frequency, retention period, and expected volume of changes.
  5. Estimate outbound, cross-zone, and cross-region traffic.
  6. Add load balancers, NAT, VPN, Bastion, and public IPs.
  7. Include monitoring, logs, collection detail, and retention period.
  8. Add support, SLA, and shared costs if they are allocated across projects.
  9. Add a 10–20% buffer as a management reserve for uncertainty, not as a universal rule.

Data for the model comes from the architecture diagram, metrics from the current system during migration, or a product forecast for a new service. Cloud calculators from AWS, Azure, Google Cloud, and other providers help create a preliminary estimate, but the actual invoice is based on real consumption.

For example, the model includes 200 GB of outbound traffic per month, but after launch users download 800 GB, or caching was not configured. The number of VMs has not changed, but the final cost increases because of the network item.

That is why the formula provides the structure of the calculation, but does not prove that the budget is realistic. The next step is to insert sample numbers and separately check a small and a mid-sized project.

Calculation Examples: Small Project and Mid-Sized Project

The figures below are hypothetical and are used to demonstrate the method; they are not the current pricing of any specific provider. Before calculating, define the scenario: how many environments are required, which services run 24/7, how much data is stored, how much traffic leaves the cloud, and whether an SLA is needed.

A small scenario may be an internal corporate service or a small product: 1–2 VMs, one public IP, moderate traffic, basic backups, and minimal support. A mid-sized scenario may be an online store, fintech, or SaaS project with production and test environments, several VMs, network services, logs, regular backups, and a reserve for growth.

The table below does not show universal cloud pricing. It shows how the same TCO structure becomes a monthly estimate in two different scenarios: a small project with basic infrastructure and a mid-sized project with multiple environments, more traffic, operational services, and SLA requirements:

Cost itemSmall project: assumptionCostMid-sized project: assumptionCostComment
VM2 VMs 24/7$1605 VMs in production + 3 in testing$900VM price is only the calculation base
Disks300 GB of system and data disks$301.5 TB of disks$180Data grows faster than the number of servers
Backups / snapshotsBackups for 300 GB$20Regular backups up to 2 TB$140Retention period and volume of changes matter
Outbound traffic300 GB$252.5 TB$220Depends on user behavior
Public IPs1 IP$54 IPs$20Addresses appear in different environments
Load balancer / NAT / VPN / BastionNot used$0Load balancer, NAT, VPN/Bastion$180Appears as access becomes more complex
Monitoring and logsBasic metrics, 20 GB of logs$10Advanced logs, 300 GB$90Cost grows with detail and retention period
Service Provider SupportMinimal plan$50SLA plan for critical services$250In a B2B budget, it is better to calculate separately
Internal supportN\A$02 Engineers, Backoffice, including corporate laptops to each and taxes$10000Common mistake – not include employees engaged with support and engaged back-office employees 
Buffer15% for fluctuations$4515% for growth and peaks$300Needed as a managed reserve
Total per monthFinal monthly estimate$345Final monthly estimate$12,280Before taxes, discounts, and contract terms

The takeaway: in a small project, VMs often make up the main share, while the other items look like small add-ons. In a mid-sized scenario, the picture changes: network services, backups, logs, support, and access services grow significantly. These figures should be used not as a universal price, but as a template for checking your own estimate.

The important point is not the exact dollar amount, but the cost structure. As the project grows, the invoice usually grows not only because there are more VMs, but because the operating model becomes more complex: more environments, more traffic, longer retention periods, stronger support requirements, and more shared services.

Hidden Costs: What Is Most Often Missing from the Preliminary Estimate

In a preliminary estimate, the cloud looks static: there is a set of resources, assumptions, and a final monthly amount. In operation, resources appear temporarily, data accumulates, test environments are forgotten and left running, and some services continue to be billed after the main VM is deleted.

A typical example: a test VM is deleted after a release, but its disk, public IP, and snapshots remain. In the estimate, the resource looks closed, but in the actual invoice it continues to exist as a separate line item.

The table below shows where a preliminary estimate most often differs from the actual invoice: forgotten resources, accumulated data, variable consumption, and operational services added after launch:

Hidden costWhy it is forgottenHow it appears in the invoiceHow to control it
Outbound trafficThe model only calculates VM operationCost increases from downloads, API responses, integrationsTrack traffic directions and limits
Snapshots and backup retention periodBackup creation is included, but accumulation is notMonthly growth in storage volumeDefine retention and deletion policies
Undeleted disksThe VM was deleted, but the disk was left for “checking later”Separate storage line itemAssign an owner and lifecycle to the resource
Public IPsThe address is treated as a technical detailCharges for dedicated or unused IPsRegularly check IP attachments
Load balancer / NAT / VPN / BastionTreated as access infrastructure, not a separate cost itemOngoing charges for network servicesInclude in the architecture estimate
Logs and monitoringDetailed logging is enabled after launchGrowth in log and metric storageLimit retention period and level of detail
Support planPostponed until operationFixed surcharge for SLA and support channelsAgree on support level before launch
Test and development environmentsTreated as temporaryVMs and disks run 24/7 without loadUse schedules and environment tags
Cross-region trafficNot visible in a simple project diagramCharges for data exchange between regionsCheck service placement and replication
Shared services without allocationThey are not assigned to a specific projectCosts remain in the shared cloud accountAllocate by tags, projects, and cost centers

The takeaway: TCO is most often increased by variable, cumulative, and ownerless resources. Tags, resource owners, retention policies, and regular cost reviews help turn the cloud bill into a manageable financial model.

This table shows what is usually forgotten at the estimate stage. The next step is to see how these line items turn into real overspending scenarios after launch.

Scenarios Where the Cloud Becomes More Expensive Than Expected

Hidden costs rarely appear as one large mistake on the first day of operation. More often, they accumulate gradually: the project grows, users behave differently, the team adds temporary resources, logging becomes more detailed, and shared services remain outside the original estimate.

Consider a simple example: an online store that sells custom mugs. At launch, the team estimates virtual machines, disks, basic backups, and moderate traffic. The first version works as expected. Then marketing launches a campaign, engineers prepare releases, test environments appear, logging expands after an incident, and later the business asks for a backup region. The architecture still looks familiar, but the operating scenario has changed — and the monthly bill changes with it.

Scenario 1. The Viral Mug Campaign That Sent Traffic Through the Roof

At launch, the store behaves predictably: users open product pages, add mugs to the cart, pay for orders, and occasionally download previews. The estimate includes moderate outbound traffic because the team models normal customer behavior.

Then marketing launches a campaign: users can create a mug with a meme, upload their own image, download a preview, receive a PDF mockup, and share a link with friends. The campaign works. Traffic grows not only because there are more visitors, but also because each user receives more data: images, previews, API responses, files, and integration payloads.

The virtual machines may still remain within the original plan. CPU does not have to become the main problem. Overspending appears in another line item: outbound traffic. Before such campaigns, it is not enough to ask whether the servers can handle the load. The team also needs to calculate which data leaves the cloud, which files can be served through a CDN, where caching should be enabled, and where API responses or downloadable files need size limits.

Scenario 2. Lukas Made a Snapshot and Never Came Back

Before a Friday release, an engineer named Lukas creates a snapshot of the database and disks. This is a normal practice: if the update breaks something, the team can roll back quickly. The release goes well, everyone relaxes, Lukas moves on to the next task, and the snapshot stays.

A week later, another snapshot is created before a catalog migration. Then another one appears before an update to the order builder. Each snapshot was created “just in case” and was supposed to live for a few days, but none of them has an owner, a deletion date, or a lifecycle rule.

At the same time, regular backups continue to run on schedule. The database grows, the volume of changes increases, and retention is extended “just in case.” As a result, the team may add no new virtual machines, but the storage line in the invoice still grows. In some systems, frequent or long-lived snapshots may also affect performance during heavy write activity or backup operations. To avoid this, manual snapshots need an owner, a reason, and a deletion date, while regular backups should be separated into short-term retention, long-term archive, and automatic deletion of old copies.

Scenario 3. The Test Environment That Became Permanent

Before a major update, the team creates a separate test environment: several VMs, a test database, a public IP, NAT, logs, and basic backups. It is needed for a couple of days to check the release and show the new version to a manager or a client.

The release is completed, the task is closed, and the team moves into the next sprint. Nobody deletes the environment because it “may still be useful.” Then a second environment appears for a demo, a third one for load testing, and a fourth one for integration checks. Each of them was created as temporary, but together they start behaving like permanent infrastructure.

This type of overspending is dangerous because production may fully match the estimate while the invoice still grows. The money is not spent on the main service, but on the history of forgotten tasks. Temporary infrastructure should therefore be controlled by technical rules, not verbal agreements: shutdown schedules, owner tags, lifecycle tags, and reports on inactive resources should be part of the process for creating test and development environments.

Scenario 4. The Backup Region That Wanted Its Own Budget

When the store starts receiving more orders, the business asks a reasonable question: what happens if the primary region becomes unavailable? The team adds a backup region, database replication, file copies, and separate backups. Technically, this is a sensible step: resilience improves, and recovery after a failure becomes faster.

The problem starts when the backup region is treated as a checkbox in the architecture diagram rather than a separate financial scenario. The invoice now includes cross-region traffic, additional data copies, duplicated backups, network services, and sometimes separate monitoring settings.

At first, these costs may be almost invisible. But as the catalog grows, the number of orders increases, and new images, documents, and archives appear, replication becomes a constant source of spending. A backup region should therefore be calculated separately: which data is copied, how often, what volume it may reach after several months, which RPO/RTO targets the business actually needs, and how much normal growth will cost — not only the initial setup.

Scenario 5. The Logs Got Loud After One Bad Night

At the start of a project, logging is usually moderate: basic metrics, application errors, and a few alerts. Then one bad night happens: during a sale, the checkout or mockup generator breaks, and the team spends several hours trying to understand the cause.

After that, the decision looks obvious: enable more logs, more tracing, more events, and longer retention. This helps with incident investigations, but such settings often remain enabled forever. Detailed application logs, network events, request tracing, and 90-day retention start to become a serious line item in the invoice.

The problem is not logging itself. Without logs, operations become blind. The problem is that emergency mode becomes the default mode. After an investigation, logging rules should be reviewed: critical events can be stored longer, diagnostic logs can have shorter retention, and detailed tracing should be enabled temporarily instead of becoming the default for the whole project.

Scenario 6. The VPN Everyone Used and Nobody Paid For

When the project becomes part of a larger B2B infrastructure, shared services appear around it: VPN, security gateway, centralized logs, monitoring, a support plan, shared accounts, and management tools. Each individual project may use only a part of this platform, but the platform itself costs money continuously.

While there are only a few projects, shared costs often remain in one cloud account. Then more services appear. One uses the shared VPN, another uses centralized logs, and a third relies on monitoring and security tools. In each project estimate, everything looks clean: VMs, disks, traffic, and backups. But the total cloud bill is somehow higher than the sum of individual models.

This happens when shared services are not allocated to cost centers. Projects look cheaper than they really are, while the total invoice grows without a clear owner. To make TCO honest, shared costs need allocation rules: by project, environment, consumption volume, or an internal cost model. Otherwise, the financial model may look clean, but it will remain incomplete.

These scenarios show that overspending is usually not caused by one incorrect price. It is caused by a change in the operating scenario. A successful campaign, a release, a test environment, a backup region, expanded logging, or a shared platform may all be technically correct decisions. But if these changes are not reflected in the TCO model, the preliminary estimate quickly stops matching the actual invoice.

Conclusion

A preliminary estimate should include not only running VMs, but also everything that continues to exist around them during operation: disks, IPs, snapshots, logs, network services, support, and shared platform costs. These items may look small separately, but together they define the real monthly TCO.

The main risk is to treat cloud cost estimation as a one-time exercise before launch. In practice, the bill changes whenever the operating scenario changes: traffic grows, backups accumulate, test environments remain active, logging expands, or shared services appear outside the project budget.

A practical calculation starts with the architecture diagram, the load scenario, and assumptions about data, traffic, retention periods, and support requirements. A provider calculator helps build the first estimate, but it does not replace post-launch control. To keep the invoice manageable, resources need owners, tags, lifecycles, deletion policies, and regular checks against the plan.

The better the TCO model reflects how the infrastructure actually works, the easier it is to plan budgets, explain cloud costs to the business, and avoid unpleasant surprises after launch.

FAQ

Why can’t cloud cost be calculated only by the VM price?

A VM is only a compute resource. Production infrastructure almost always needs disks, IP addresses, backups, network services, monitoring, logs, and support. These items can make up a significant share of the invoice.

Which costs are most often forgotten during calculation?

The most commonly missed items are outbound traffic, accumulation of snapshots and backups, unused disks, public IPs, logs, test environments, and shared services that are not allocated across projects.

Why does the calculator estimate differ from the actual invoice?

The calculator works with the parameters entered: operating hours, disk volume, traffic, region, and retention periods. The actual invoice depends on real consumption and what the team does after launch.

What buffer should be included in a cloud infrastructure budget?

For a preliminary estimate, teams often use a 10–20% buffer, but this is not a universal rule. The buffer size depends on load predictability, seasonality, traffic volume, and cost control maturity.

When do committed-use discounts make sense?

They are suitable for stable and predictable workloads that will run for a long time. For experimental, seasonal, or rapidly changing projects, such commitments can create unnecessary costs.

How can the risk of overspending after launch be reduced?

Assign resource owners, use tags and cost centers, limit retention periods for logs and backups, shut down development and test environments on a schedule, and regularly compare the actual invoice with the calculation model.

Sources

1. Microsoft Learn — Plan to manage costs for Azure virtual machines

2. Google Cloud — Cost estimation documentation

3. AWS Pricing Calculator

4. FinOps Foundation — Identifying Shared Costs

Subscribe to our newsletter and receive articles and news

    Check out our other materials

    • How to Evaluate a Cloud Provider Before Migration: Technical Due Diligence for CTOs

      Technical due diligence is not about checking the cloud provider’s storefront. It is about testing real scenarios: what happens during peak load, an outage, data recovery,...

    • Cloud Infrastructure for Medical Data: Encryption, Access Control, Regions, and Provider Requirements

      Medical data can be stored in the cloud, but a cloud environment cannot be assessed only by the provider’s name, the selected region, or enabled...

    • RAG Infrastructure in the Cloud: Where to Place the Vector Database, Object Storage, API, and Models

      RAG infrastructure should not be designed only around the LLM or the vector database. In a production system, the entire data path matters: where documents...