Moving applications and infrastructure from on-premises and data centers to the cloud changes the economics of your costs.
Indeed, one of the principal benefits of migrating to the cloud is to switch costs to where almost everything is an operational expense (OpEx), in place of capital expenses (CapEx).
This article helps to outline key cloud cost considerations as you move applications and infrastructure from on-premises to the Cloud. The OpEx savings principles and computing flexibility benefits of cloud-based systems are well documented. However, as we’ve seen, there are also some important factors to consider ensuring your cloud costs remain predictable month to month and don’t rise unexpectedly.
Before we explore the cost “risks” of cloud-based applications and infrastructure, let’s quickly outline some of cost benefits moving to the Cloud:
Additionally, other cost benefits include the fact that someone else is patching and updating underlying infrastructure, integrations between services are much easier, and infrastructure as code enables much more granular on-demand resource scaling (up and down) as needed.
Along with substantial cost benefits, there are some important cloud cost “hazards” to be aware of as you consider cloud-based applications and infrastructure.
Cloud is a very powerful and flexible tool. As with any powerful tool, it’s easy to hurt oneself using it. So care and planning is essential for large deployments. If you’re coming from a CapEx dominated traditional cost model, the Cloud cost model turns everything upside down.
Here are some new cloud application and infrastructure cost categories to address:
Let’s look at each of these in some detail and propose potential remedies you might explore.
When everything is in OpEx, costs can swing significantly from one period to another. Budgeting process needs to be aware of that. We can distinguish two types of unpredictability:
Usually the first type of unpredictability is a good one. You just want to make sure that the revenue-curve is ahead of the cost-curve, so if you get a spike in usage you also get a spike in revenue to pay for that usage.
The second type of unpredictability is one you need to protect against. The remedy? All cloud vendors offer DDOS protection services, as do plenty of 3rd party vendors. However, DDOS defense might not be simple and costs can rise disproportionately from the volume of attack. Nevertheless, fast reaction time is critical in this case.
When all your infrastructure is defined in code, one small mistake can lead to hundreds of thousands of dollars lost in a matter of hours, or even minutes.
An example of this might be a script which creates resources in an infinite cycle or does cleanup of resources incorrectly after the job is done. The remedy for this is similar to the wasteful unexpected costs. DevOps team needs to establish robust and responsive alerts around cloud costs to catch runaway processes. Unfortunately, costs data usually lags behind actual API calls by a few hours and alerts can only help so much. As with any mission critical software, a good software development lifecycle (SDLC) is necessary for cloud deployment. Software needs to be tested and monitored.
Over time, idle unterminated resources can start to accumulate, and it might be hard to figure out in a constantly changing production environment which resources are still critical and which ones are now just garbage.
Engineers naturally tend to stay on a safe side and so they tend to keep resources in place without terminating them. This leads to an accumulation of idle resources and consequently an accumulation of costs associated with them.
An example of such resources can be unterminated instances, hard drives, or data in the object storage. The remedy? Garbage creep can be addressed by establishing a regular review and cleanup procedure which involves going through all resources and finding which of them can be disabled. Cloud vendors(AWS, Google Cloud, Microsoft Azure) and 3rd party vendors offer many very useful tools that allow tagging, reporting and monitoring resource usage. But, for the most part the cleanup still needs to be supervised by a human – especially if the tagging is not thorough.
Cloud costs encapsulate complex engineering and financial concepts into a single dollar number – which makes cloud costs very complicated to untangle. This makes cloud costs a black box for customers, since we don’t really know what’s driving those costs.
Different application and infrastructure architectures will lead to very different costs, up to 10x cost differences. So, it’s vital to design and model several architectures — even if you have previous cloud experience – to determine which is the most cost effective, and BEFORE you commit to any development.
There are many tools which help cloud costs management. But even these tools can have a hard time keeping up with the complexity and breadth of cloud offerings. This means there’s no replacement for in-house human expertise dedicated to cloud costs management.
All cloud vendors offer tools to analyze costs and, depending on the scale of your cloud environment, they might be sufficient. But for large deployments, it’s very likely you will need more insights and more ways to dissect data. Since costs are a structured data, there are so many 3rd-party tools you can use to better understand your cloud costs (Excel, SQL databases, BI tools, etc.).
Both pre-planned cost analysis and ad-hoc cost analysis are very helpful to find reasons for increased spend and to catch spending anomalies, such as a potential waste of cloud resources.
The person or team responsible for analyzing cloud costs needs to be proficient with data analysis techniques, needs to understand utilized cloud services and their cost structures, and needs to understand the application and infrastructure itself. This is a very important cross-functional role which can be challenging to hire, but the right people in this position can be a lifesaver for the company.