BACK

Google Cloud Spot VMs

by: CloudBolt / January, 26 2024

Explore the chapters:

As the scale of public cloud infrastructure increased to meet overall demands, providers sought a method to utilize spare Virtual Machine (VM) resources. In 2009, Amazon introduced Amazon EC2 Spot Instances, where customers bid on unused or underutilized capacity. Google Cloud Platform (GCP) and Microsoft Azure followed suit and offered the same feature, albeit advertised under different names.

As a result, in 2015, GCP provided Preemptible Virtual Machines (VMs), which matured into what we now know as Spot VMs. Spot VMs can provide up to 91% cost savings compared to on-demand VMs, so they should interest anyone deploying workloads into GCP.

This article will explain the benefits, considerations, and use cases for GCP Spot VMs. By the end, you’ll be able to determine whether they are the right choice for you.

Executive Summary

Compute Engine is a GCP Infrastructure as a Service that enables you to create and run virtual machines (VMs). VMs are offered at different prices that lend themselves to various use cases. The following table provides a quick summary for your reference.

Pricing type	Pricing model	Use case
On-demand	Standard pricing model, pay-as-you-go for VM instances	Any workload
Preemptible	Reduced pricing model based on spare capacity	Fault-tolerant workloadsCan only run for up to 24 hours GCP recommends Spot instead
Spot	Reduced pricing model based on spare capacity	Fault-tolerant workloads
Sole-Tenant	Premium pricing model for exclusive access to a sole-tenant node. Fixed 10% sole-tenancy premium	Compute performanceSecurity and complianceLicensing requirements

This article will cover the Spot pricing option to determine whether it’s likely to benefit your workload. Please note that Preemptible VMs can no longer be created via the GCP console and are now replaced by Spot VMs. Spot VMs do not have a 24-hour maximum runtime limitation.

Type of tenancy

An early consideration when using VMs is their tenancy type. GCP offers multi-tenant VMs and sole-tenant VMs.

Multi-tenant VMs share resources with other GCP users, whereas sole-tenant VMs enjoy exclusive access to a physical server. GCP recommends sole-tenant VMs for the following use cases:

specific computing performance requirements, e.g., for gaming workloads or machine learning
security and compliance requirements, such as healthcare or finance
licensing requirements, e.g., Windows workloads

A 10% premium is applied to all sole-tenancy vCPU and memory resources, which should be factored into your budget. Generally, it’s best to use multi-tenant VMs to maximize cost savings unless your workload is listed above.

Hybrid Cloud Solutions Demo

See the best multi-cloud management solution on the market, and when you book & attend your CloudBolt demo we’ll send you a $75 Amazon Gift Card.

Book demo

Virtual machines

A virtual machine (VM) comprises virtualized CPU, memory, storage, and other software-defined hardware that can boot an operating system and run applications. Multiple VMs (even hundreds) can occupy a single physical host (called a hypervisor) and share its underlying resources.

Creating a VM on GCP involves selecting from machine families, machine series, and machine types:

family: the set of processor and hardware configurations optimized for specific workflows, e.g., general-purpose machine family
series: classifies the machine families into versions. Usually, generations of machine series use a higher number, e.g., the N1 series is older than the N2 series within the general-purpose machine family
type: a machine series has a predefined machine type that provides resources to your VM. You can also create custom machine types

GCP provides machine and series recommendations based on workload types. Spot VMs offer the same machine types as on-demand instances. Graphics Processing Units (GPUs) may also be attached to your Spot VMs at lower spot prices. For some workloads, like rendering high-resolution images and video, a GPU provides faster processing than a Central Processing Unit (CPU).

Resource-based pricing

From 1st October 2018, GCP began billing machine types as individual vCPU and memory SKUs rather than billing machine types as a single unit. You will now see separate usage for vCPU and memory on your invoice rather than a single charge based on the machine type.

The image shows an example of separated billing of RAM and vCPU (core) of Spot VMs (Source)

VMs are charged for a minimum of one minute of usage. Therefore, even if you use a VM for thirty seconds, you are billed for one minute. After one minute, your instance is billed on a per-second basis.

Type of discount

Resource-based pricing enables Compute Engine, the Infrastructure as a Service component of GCP, to apply discounts to your collective vCPU and memory usage, independent of the machine type. Compute Engine offers discounts for:

sustained use: if a vCPU or a GB of memory is used for more than 25% of a month, Compute Engine applies a discount for every additional incremental second
committed use: Compute Engine offers committed use contracts that provide heavily discounted prices for VM usage. A contract is a one or three-year commitment
spot use: Spot VMs provide access to GCP spare capacity at a discount of between 60%-91% compared to on-demand pricing. The spot price does not change more than once-a-month, but as shown below, it varies between the GCP Regions for a given machine type

Spot VMs

Spot VMs suit fault-tolerant workloads because they have the potential to be preempted (stopped) by Compute Engine at any time. As a result, they have no availability guarantee and may be halted in preference for higher priority requests, e.g., where another GCP user requests an on-demand instance. When your Spot VM has been preempted, Compute Engine provides a 30-second grace period during which you can run a shutdown script.

Fault-Tolerant workloads

Fault-tolerant workloads can withstand random interruptions and include:

batch processing
DevTest environments
stateless applications
stateful applications where checkpointing is conducted
GCP Managed Services that support Spot VMs, e.g., Google Kubernetes Engine (GKE)

It may be necessary to adapt your code before taking full advantage of Spot VMs. For example, you may need to save progress after each iteration to make your code resumable. You can write a shutdown script to help resume an application after it has been preempted, and we will delve deeper into resumable workloads and shutdown scripts later.

Spot VM pricing

Spot VM pricing is dynamic because it is based on supply and demand. GCP states that Spot prices always provide a 60-91% reduction compared to on-demand prices for machine types and GPUs. The Spot price can change only once every 30 days.

GCP pricing is regional, and a VM’s on-demand and Spot prices typically differ between regions. For example, the on-demand monthly pricing for an e2-medium machine type in GCP region us-west4 is $27.54, whereas, in us-west3, it is $29.38. The Spot monthly pricing in us-west4 is $2.75 compared to $8.81 in us-west3.

The image shows the E2 shared-core machine prices in *us-west4* (Source)

The image shows the E2 shared-core machine prices in *us-west3* (Source)

In both GCP regions, Spot pricing offers a considerable discount over on-demand pricing. However, the Spot price differs significantly between the two regions, with us-west4 being over three times cheaper than us-west3. Our example prices were taken from the 25th of August 2022 and can be found here: VM instance pricing | Compute Engine: Virtual Machines (VMs) | Google Cloud.

Creating a Spot VM

In the Google Cloud Console, visit Compute Engine from your project and carry out the following steps:

Select VM instances
Create instance
Specify the Region and Zone
Specify the Machine Configuration
Expand the Advanced options
Expand Management > Availability policies
- The image shows where to set the VM provisioning model in the GCP cloud console (Source)
Change VM provisioning model from Standard to Spot
- The image shows where to change the VM provisioning model from Standard to Spot (Source)
Specify on VM termination: either Stop or Delete
Stop: Can be restarted, and memory is not saved
Delete: Permanently deleted, and memory is not saved

A comprehensive approach to hybrid cloud management

Platform	Multi Cloud Integrations	Cost Management	Security & Compliance	Provisioning Automation	Automated Discovery	Infrastructure Testing	Collaborative Exchange
CloudHealth	✔	✔	✔
Morpheus	✔			✔	✔
CloudBolt	✔	✔	✔	✔	✔	✔	✔

Shutdown scripts

You can configure a Spot VM to execute a shutdown script that will run if your Spot VM is preempted. A shutdown script is supplied as metadata to the Spot VM.

The image shows adding a shutdown script as metadata to the Spot VM in the cloud console (Source)

During VM creation, you can provide metadata as key-value pairs. The metadata is stored on a metadata server, to which the VM has automatic access. See the table below for examples.

Key	Value
shutdown-script	Contents of the script, limited to 256KB
shutdown-script-url	Provide the URL to the shutdown script on Cloud Storage, which can exceed 256KB. See Use shutdown script from Cloud Storage for more information

Compute Engine runs your shutdown scripts on a best-effort basis, so they are not guaranteed to run. Another limitation is that the script must complete within 30 seconds after Compute Engine preempts the VM. An on-demand VM, on the other hand, has 90 seconds. Please refer to Running shutdown scripts | Compute Engine Documentation | Google Cloud.

A shutdown script can be of any file type. For example, we could write the shutdown script in Python by providing the following shebang line at the top.

#!/usr/bin/python3

Example shutdown script

We can manually stop a Spot or on-demand VM to test a shutdown script, which will trigger its execution. For example, we could create a new Spot VM and use the following shutdown script.

#!/usr/bin/python3
file1 = open('/var/tmp/shutdown.txt', 'w')
file1.write('*** Shutdown script ***\n')
file1.close()

Our script uses Python to create a file called shutdown.txt with a single line of text. The script is added under the Custom metadata Key shutdown-script, and the Value is the script contents. We could even adapt the shutdown script to save the application’s state to Google Cloud Storage, BigQuery, or another service. The data you must store to resume the application will be app specific.

The image shows an example of a *key-value* pair for a Python shutdown script

To test the shutdown script, start a Spot VM with machine type e2-micro in the region us-west4. A preemption event is simulated by starting and manually stopping the VM. The GCP console will warn you when you try to stop the VM. Proceed and ignore the warning by selecting STOP.

The image shows the GCP console displaying a warning (Source)

After stopping the VM, restart it and connect via SSH using a browser window.

The image shows connecting to a VM via SSH from the GCP cloud console (Source)

From the Spot VM’s terminal, you can view the output of the shutdown script.

The image shows that our Python shutdown script executed successfully

In this example, we configured the Spot VM to stop if terminated by a preemption event. Our script output has been written to the boot disk, so we can view its contents when the VM is restarted. If we configure the Spot VM to delete itself (using the preemption event), the VM instance would no longer be available. If we wish to preserve the boot disk, set the deletion rule to “Keep disk”, enabling us to create a new VM from the persisted boot disk.

The image shows setting a deletion rule to keep the boot disk if a VM instance is deleted (Source)

Please note that both stopped VMs and unused persistent disks still incur costs.

Recommendations

Resumable workloads

There are a few different approaches to making your workload resumable. In the above example, the shutdown script created a file on the VM’s attached disk. The shutdown script stores data in this file to enable the workflow to restart.

Your application might process files (blobs) from Google Cloud Storage (GCS) for batch processing tasks. Your code could delete the input blob from GCS after successfully processing it. If the Spot VM is preempted, there is nothing to save. When a new Spot VM is available, your application can start processing any remaining files from GCS.

If you cannot delete the input files, your application code could save progress to a local file, external database, GCS, or another service. When a new Spot VM starts, your code will read the saved progress to determine from where to restart.

For other workloads, there might be something in memory that needs to be saved to resume your workloads. For example, a batch processing iteration might take a long time, making it inefficient to resume at the start of an iteration. So, you must adapt the application code to perform check-pointing within the iteration. Again, a shutdown script may help with saving any application state.

Region

Spot VM pricing varies by GCP region. Review the Spot prices from Pricing | Compute Engine: Virtual Machines (VMs) | Google Cloud each month to maximize cost savings.

Nights and weekends are likely to have better Spot VM availability, so using a region in a time zone opposite your working hours may provide better availability. For example, if your time zone is Greenwich Mean Time (GMT) and you request a Spot VM at 10:00, this would be 02:00 in the GCP region us-west-4 in Las Vegas.

Instance templates

GCP recommends using an instance template to create your Virtual Machines. A template lets you specify your VM configuration once and use it to create multiple identical VMs. A template makes it easier to re-create multiple Spot VMs with consistent settings.

The image shows a VM instance template configured for Spot provisioning (Source)

Managed Instance Groups

Managed Instance Groups (MIGs) can help to make your workloads more flexible and resilient, especially if you require multiple Spot VMs. A MIG is created from an instance template with minimum and maximum instance numbers.

The image shows MIG auto-scaling and configuring minimum and maximum instances (Source)

A MIG attempts to maintain the target size of VMs. So if Compute Engine stops one or more Spot VMs in a MIG, the group will try to recreate those VMs using the specified instance template. If the resources are available, the MIG auto-starts new Spot VMs. See Spot VMs | Compute Engine Documentation | Google Cloud for further information.

We can configure a MIG as regional, meaning it spreads instances across multiple zones within a region. If one zone had high demand or went offline, the MIG would try to re-create your Spot VMs in a different zone.

The image shows configuring a MIG to spread VMs across multiple zones within a region (Source)

Limitation of Spot VMs

The main limitations of Spot VMs are:

Compute Engine could preempt your Spot VMs at any time
Prices vary between regions and can change every 30 days
There are no guarantees of availability and no Service Level Agreement
The Google Cloud Free Tier credits for Compute Engine do not apply to Spot VMs. Free Trial Credit does, however

A comprehensive approach to hybrid cloud management

Only solution with automated discovery, testing, provisioning, security, and cost management

A `single pane`for infrastructure spanning on-premise, private cloud, and multiple public clouds

A comprehensive framework that extends your existing tool investments and fills the gaps

Conclusion

Spot VMs can provide considerable cost savings for your Compute Engine workloads. Adapting your workflows to make them fault-tolerant may be necessary, but this is a good practice. Use shutdown scripts to save your workload state and make them resumable. They must, however, run within 30 seconds of the preemption trigger. You can simulate preemption to test your shutdown scripts by manually stopping the VM.

Spot VM pricing is region and availability-dependent. The most cost-effective region may not be your local GCP region, and availability may be better outside regular working hours. If your data residency requirements permit, a GCP region within an opposite time zone to your working hours may provide better availability.

Review the pricing of your Spot VMs every month and develop your workloads so they can move to the GCP region with the best Spot price. GCP recommends using smaller machine types to reduce your preemption rates. Further GCP recommendations are available from Google here.

Explore the chapters: