Google Cloud Spot VMs
As the scale of public cloud infrastructure increased to meet overall demands, providers sought a method to utilize spare Virtual Machine (VM) resources. In 2009, Amazon introduced Amazon EC2 Spot Instances, where customers bid on unused or underutilized capacity. Google Cloud Platform (GCP) and Microsoft Azure followed suit and offered the same feature, albeit advertised under different names.
As a result, in 2015, GCP provided Preemptible Virtual Machines (VMs), which matured into what we now know as Spot VMs. Spot VMs can provide up to 91% cost savings compared to on-demand VMs, so they should interest anyone deploying workloads into GCP.
This article will explain the benefits, considerations, and use cases for GCP Spot VMs. By the end, you’ll be able to determine whether they are the right choice for you.
Compute Engine is a GCP Infrastructure as a Service that enables you to create and run virtual machines (VMs). VMs are offered at different prices that lend themselves to various use cases. The following table provides a quick summary for your reference.
|Standard pricing model, pay-as-you-go for VM instances
|Reduced pricing model based on spare capacity
|Fault-tolerant workloadsCan only run for up to 24 hours GCP recommends Spot instead
|Reduced pricing model based on spare capacity
|Premium pricing model for exclusive access to a sole-tenant node. Fixed 10% sole-tenancy premium
|Compute performanceSecurity and complianceLicensing requirements
This article will cover the Spot pricing option to determine whether it’s likely to benefit your workload. Please note that Preemptible VMs can no longer be created via the GCP console and are now replaced by Spot VMs. Spot VMs do not have a 24-hour maximum runtime limitation.
Type of tenancy
An early consideration when using VMs is their tenancy type. GCP offers multi-tenant VMs and sole-tenant VMs.
Multi-tenant VMs share resources with other GCP users, whereas sole-tenant VMs enjoy exclusive access to a physical server. GCP recommends sole-tenant VMs for the following use cases:
- specific computing performance requirements, e.g., for gaming workloads or machine learning
- security and compliance requirements, such as healthcare or finance
- licensing requirements, e.g., Windows workloads
A 10% premium is applied to all sole-tenancy vCPU and memory resources, which should be factored into your budget. Generally, it’s best to use multi-tenant VMs to maximize cost savings unless your workload is listed above.
See the best multi-cloud management solution on the market, and when you book & attend your CloudBolt demo we’ll send you a $75 Amazon Gift Card.
A virtual machine (VM) comprises virtualized CPU, memory, storage, and other software-defined hardware that can boot an operating system and run applications. Multiple VMs (even hundreds) can occupy a single physical host (called a hypervisor) and share its underlying resources.
Creating a VM on GCP involves selecting from machine families, machine series, and machine types:
- family: the set of processor and hardware configurations optimized for specific workflows, e.g., general-purpose machine family
- series: classifies the machine families into versions. Usually, generations of machine series use a higher number, e.g., the N1 series is older than the N2 series within the general-purpose machine family
- type: a machine series has a predefined machine type that provides resources to your VM. You can also create custom machine types
GCP provides machine and series recommendations based on workload types. Spot VMs offer the same machine types as on-demand instances. Graphics Processing Units (GPUs) may also be attached to your Spot VMs at lower spot prices. For some workloads, like rendering high-resolution images and video, a GPU provides faster processing than a Central Processing Unit (CPU).
From 1st October 2018, GCP began billing machine types as individual vCPU and memory SKUs rather than billing machine types as a single unit. You will now see separate usage for vCPU and memory on your invoice rather than a single charge based on the machine type.
VMs are charged for a minimum of one minute of usage. Therefore, even if you use a VM for thirty seconds, you are billed for one minute. After one minute, your instance is billed on a per-second basis.
Type of discount
Resource-based pricing enables Compute Engine, the Infrastructure as a Service component of GCP, to apply discounts to your collective vCPU and memory usage, independent of the machine type. Compute Engine offers discounts for:
- sustained use: if a vCPU or a GB of memory is used for more than 25% of a month, Compute Engine applies a discount for every additional incremental second
- committed use: Compute Engine offers committed use contracts that provide heavily discounted prices for VM usage. A contract is a one or three-year commitment
- spot use: Spot VMs provide access to GCP spare capacity at a discount of between 60%-91% compared to on-demand pricing. The spot price does not change more than once-a-month, but as shown below, it varies between the GCP Regions for a given machine type
Spot VMs suit fault-tolerant workloads because they have the potential to be preempted (stopped) by Compute Engine at any time. As a result, they have no availability guarantee and may be halted in preference for higher priority requests, e.g., where another GCP user requests an on-demand instance. When your Spot VM has been preempted, Compute Engine provides a 30-second grace period during which you can run a shutdown script.
Fault-tolerant workloads can withstand random interruptions and include:
- batch processing
- DevTest environments
- stateless applications
- stateful applications where checkpointing is conducted
- GCP Managed Services that support Spot VMs, e.g., Google Kubernetes Engine (GKE)
It may be necessary to adapt your code before taking full advantage of Spot VMs. For example, you may need to save progress after each iteration to make your code resumable. You can write a shutdown script to help resume an application after it has been preempted, and we will delve deeper into resumable workloads and shutdown scripts later.
Spot VM pricing
Spot VM pricing is dynamic because it is based on supply and demand. GCP states that Spot prices always provide a 60-91% reduction compared to on-demand prices for machine types and GPUs. The Spot price can change only once every 30 days.
GCP pricing is regional, and a VM’s on-demand and Spot prices typically differ between regions. For example, the on-demand monthly pricing for an e2-medium machine type in GCP region us-west4 is $27.54, whereas, in us-west3, it is $29.38. The Spot monthly pricing in us-west4 is $2.75 compared to $8.81 in us-west3.
In both GCP regions, Spot pricing offers a considerable discount over on-demand pricing. However, the Spot price differs significantly between the two regions, with us-west4 being over three times cheaper than us-west3. Our example prices were taken from the 25th of August 2022 and can be found here: VM instance pricing | Compute Engine: Virtual Machines (VMs) | Google Cloud.
Creating a Spot VM
In the Google Cloud Console, visit Compute Engine from your project and carry out the following steps:
- Select VM instances
- Create instance
- Specify the Region and Zone
- Specify the Machine Configuration
- Expand the Advanced options
- Expand Management > Availability policies
- The image shows where to set the VM provisioning model in the GCP cloud console (Source)
- Change VM provisioning model from Standard to Spot
- The image shows where to change the VM provisioning model from Standard to Spot (Source)
- Specify on VM termination: either Stop or Delete
Stop: Can be restarted, and memory is not saved
Delete: Permanently deleted, and memory is not saved
Multi Cloud Integrations
Security & Compliance
You can configure a Spot VM to execute a shutdown script that will run if your Spot VM is preempted. A shutdown script is supplied as metadata to the Spot VM.
During VM creation, you can provide metadata as key-value pairs. The metadata is stored on a metadata server, to which the VM has automatic access. See the table below for examples.
|Contents of the script, limited to 256KB
|Provide the URL to the shutdown script on Cloud Storage, which can exceed 256KB. See Use shutdown script from Cloud Storage for more information
Compute Engine runs your shutdown scripts on a best-effort basis, so they are not guaranteed to run. Another limitation is that the script must complete within 30 seconds after Compute Engine preempts the VM. An on-demand VM, on the other hand, has 90 seconds. Please refer to Running shutdown scripts | Compute Engine Documentation | Google Cloud.
A shutdown script can be of any file type. For example, we could write the shutdown script in Python by providing the following shebang line at the top.
Example shutdown script
We can manually stop a Spot or on-demand VM to test a shutdown script, which will trigger its execution. For example, we could create a new Spot VM and use the following shutdown script.
#!/usr/bin/python3 file1 = open('/var/tmp/shutdown.txt', 'w') file1.write('*** Shutdown script ***\n') file1.close()
Our script uses Python to create a file called shutdown.txt with a single line of text. The script is added under the Custom metadata Key shutdown-script, and the Value is the script contents. We could even adapt the shutdown script to save the application’s state to Google Cloud Storage, BigQuery, or another service. The data you must store to resume the application will be app specific.
To test the shutdown script, start a Spot VM with machine type e2-micro in the region us-west4. A preemption event is simulated by starting and manually stopping the VM. The GCP console will warn you when you try to stop the VM. Proceed and ignore the warning by selecting STOP.
After stopping the VM, restart it and connect via SSH using a browser window.
From the Spot VM’s terminal, you can view the output of the shutdown script.
In this example, we configured the Spot VM to stop if terminated by a preemption event. Our script output has been written to the boot disk, so we can view its contents when the VM is restarted. If we configure the Spot VM to delete itself (using the preemption event), the VM instance would no longer be available. If we wish to preserve the boot disk, set the deletion rule to “Keep disk”, enabling us to create a new VM from the persisted boot disk.
Please note that both stopped VMs and unused persistent disks still incur costs.
There are a few different approaches to making your workload resumable. In the above example, the shutdown script created a file on the VM’s attached disk. The shutdown script stores data in this file to enable the workflow to restart.
Your application might process files (blobs) from Google Cloud Storage (GCS) for batch processing tasks. Your code could delete the input blob from GCS after successfully processing it. If the Spot VM is preempted, there is nothing to save. When a new Spot VM is available, your application can start processing any remaining files from GCS.
If you cannot delete the input files, your application code could save progress to a local file, external database, GCS, or another service. When a new Spot VM starts, your code will read the saved progress to determine from where to restart.
For other workloads, there might be something in memory that needs to be saved to resume your workloads. For example, a batch processing iteration might take a long time, making it inefficient to resume at the start of an iteration. So, you must adapt the application code to perform check-pointing within the iteration. Again, a shutdown script may help with saving any application state.
Spot VM pricing varies by GCP region. Review the Spot prices from Pricing | Compute Engine: Virtual Machines (VMs) | Google Cloud each month to maximize cost savings.
Nights and weekends are likely to have better Spot VM availability, so using a region in a time zone opposite your working hours may provide better availability. For example, if your time zone is Greenwich Mean Time (GMT) and you request a Spot VM at 10:00, this would be 02:00 in the GCP region us-west-4 in Las Vegas.
GCP recommends using an instance template to create your Virtual Machines. A template lets you specify your VM configuration once and use it to create multiple identical VMs. A template makes it easier to re-create multiple Spot VMs with consistent settings.
Managed Instance Groups
Managed Instance Groups (MIGs) can help to make your workloads more flexible and resilient, especially if you require multiple Spot VMs. A MIG is created from an instance template with minimum and maximum instance numbers.
A MIG attempts to maintain the target size of VMs. So if Compute Engine stops one or more Spot VMs in a MIG, the group will try to recreate those VMs using the specified instance template. If the resources are available, the MIG auto-starts new Spot VMs. See Spot VMs | Compute Engine Documentation | Google Cloud for further information.
We can configure a MIG as regional, meaning it spreads instances across multiple zones within a region. If one zone had high demand or went offline, the MIG would try to re-create your Spot VMs in a different zone.
Limitation of Spot VMs
The main limitations of Spot VMs are:
- Compute Engine could preempt your Spot VMs at any time
- Prices vary between regions and can change every 30 days
- There are no guarantees of availability and no Service Level Agreement
- The Google Cloud Free Tier credits for Compute Engine do not apply to Spot VMs. Free Trial Credit does, however
Spot VMs can provide considerable cost savings for your Compute Engine workloads. Adapting your workflows to make them fault-tolerant may be necessary, but this is a good practice. Use shutdown scripts to save your workload state and make them resumable. They must, however, run within 30 seconds of the preemption trigger. You can simulate preemption to test your shutdown scripts by manually stopping the VM.
Spot VM pricing is region and availability-dependent. The most cost-effective region may not be your local GCP region, and availability may be better outside regular working hours. If your data residency requirements permit, a GCP region within an opposite time zone to your working hours may provide better availability.
Review the pricing of your Spot VMs every month and develop your workloads so they can move to the GCP region with the best Spot price. GCP recommends using smaller machine types to reduce your preemption rates. Further GCP recommendations are available from Google here.
Follow our LinkedIn monthly digest to receive more free educational content like this.
The New FinOps Paradigm: Maximizing Cloud ROI
Featuring guest presenter Tracy Woo, Principal Analyst at Forrester Research In a world where 98% of enterprises are embracing FinOps,…
VMWare Alternatives – What’s Next For Your Cloud Practice
As a VMware partner, you may have received notice that Broadcom is terminating your contract. It’s like the tech world’s…
The cloud ROI problem
Why the cloud cost problem is not going away, and why we need to change the way we look at…