BQ ML is a mode in BigQuery that allows you to create, test, and tune Machine Learning (ML) models using standard SQL queries. It supports widely used models such as linear regression for forecasting, K-means clustering for data segmentation, and Deep Neural Networks. Google Cloud’s AutoML and Vertex AI can create the same models. However, one would typically choose BQ ML when training data within BigQuery or when Data Analysts prefer a self-service approach to building models.

BQ ML democratizes ML by allowing Data Analysts without extensive training in Data Science to create and use models based on business data. Development time is also significantly reduced because the data used for model creation already exists within the same data warehouse. 

For these reasons, many organizations migrate their analytic workloads exclusively to BigQuery. However, the most critical question remains: What will it cost?

There are two usage patterns: on-demand pricing and flat-rate pricing. Understanding these options is fundamental to creating a well-planned BQ ML cost strategy. Other aspects of Machine Learning Operations (MLOps), such as hyperparameter tuning, model deployment, evaluation, inference, and feature processing, can be conducted within BQ ML and are charged as queries relative to the storage used and the amount of data processed.

Executive Summary

This article will explain the following concepts, which we have summarized here for your easy reference:

Pricing StructureDescription
Free tierThe following parameters have free usage limits:storagedata processeddata inserteddata processed by CREATE MODEL queriesOnce the free usage limits have been surpassed, the user will start to incur costs according to on-demand pricing
On-demand pricingOn-demand pricing is how operations are billed as-is. Processes such as model creation, evaluation, inspection, and prediction incur costs when run ad-hoc. Additionally, costs differ based on the model type. Some examples of model types include logistic regression, k-means clustering, and DNN
Flat-rate pricingFlat-rate pricing is the primary savings vehicle used for BigQuery workloads. It works by purchasing commitments and assigning those commitments to Google Cloud projects. A commitment pertains to dedicated BigQuery slots. Slots describe a unit of query processing.

BQ ML pricing

Pricing usage patterns can be categorized into: 

  • On-demand pricing
  • Flat-rate pricing

Note that free usage applies to BQ ML, wherein operations are free to a certain extent. On-demand pricing refers to queries that are billed as-is with no special discounts. Customers can use flat-rate pricing to save money when the number of monthly queries is predictable. There are two model types:

  • Built-in models
  • External models

A built-in model is a model that is trained within BigQuery, and an external model uses other Google Cloud services like Vertex AI and AutoML. The pricing for built-in models would depend on model and operation type. With external models, pricing is still affected by model and operation type with the addition of any other Google Cloud services outside of BQ ML like Vertex AI and AutoML.

Free tier

There are four parameters of usage: 

  • Storage
  • Model prediction, inspection, and evaluation queries
  • Use of BigQuery Storage Write API
  • Model creation queries
ParameterFree usage limits
StorageThe first 10 GB per month is free
Model prediction, inspection, and evaluation queriesFirst 1 TB of query data processed per month is free
BigQuery Storage Write APIFirst 2 TB of data inserted into BigQuery via API per month is free
Model creation queriesFirst 10 TB of data processed by CREATE MODEL queries per month is free

If your usage is well below the free-tier threshold, there’s a good chance you’re not paying for BigQuery. However, once you start scaling your services, you will likely notice that your bill is increasing, which may result from on-demand pricing.

Hybrid Cloud Solutions Demo

See the best multi-cloud management solution on the market, and when you book & attend your CloudBolt demo we’ll send you a $75 Amazon Gift Card.

Book demo

On-demand pricing

This pricing comes on an as-is basis, and the actual costs incurred will depend on the operations you’re carrying out. The available Big Query operations are as follows:

  • Model creation
  • Model evaluation
  • Model inspection
  • Model prediction

Note that Model Creation for Matrix Factorization, the most commonly used model for recommender systems, is not supported by on-demand pricing and is available as a Flat-rate only.

Operation typeModel typePricing
Model CreationLogistic Regression$250 per TB
Linear Regression
K-means clustering
Model CreationAutoML tables$5 per TB, plus Vertex AI training cost
Boosted tree
Model EvaluationAll types$5 per TB of data processed
Model Inspection
Model Prediction

Flat-rate pricing

This type of pricing is best for clients with large-scale BigQuery model deployments. The predictability of monthly costs makes cost optimization more simple. It works by using reservations for both built-in and external models, and the general flow is as follows:

  1. The user purchases “commitments”
  2. The user assigns slots to reservations
  3. The user assigns one or more projects to a reservation
Image shows the typical steps to use flat-rate pricing (source)
A comprehensive approach to hybrid cloud management
Multi Cloud Integrations
Cost Management
Security & Compliance
Provisioning Automation
Automated Discovery
Infrastructure Testing
Collaborative Exchange


Clients purchase “commitments” or dedicated query processing capacity. Commitments are measured as BigQuery “slots.” A slot is a vCPU used to execute SQL queries. The number of slots needed depends on the size of the query and its complexity. A commitment also has a duration. There are three types of commitment:

  • Annual
  • Monthly
  • Flex

Annual commitments last for a minimum of 365 days. Monthly commitments last for a minimum of 30 days. Lastly, Flex slots can last for as little as 60 seconds and are typically used for testing and seasonal demands. 


After buying commitments, users will assign them to different “reservations.” This resource-allocation system allows you to associate commitments to other workloads. For example, you could attach commitments to reservations called “prod” for production workloads, “dev” for development workloads, and so forth.

Once complete, one or more projects, folders, or organizations must be assigned to a reservation to make the slots usable. For example, if you assign a reservation to a project, then the datasets within that project can use the slots associated with the reservation.


Assignments have two possible job types for BQ ML:


A QUERY job includes BQ ML queries such as model creation, inspection, evaluation, and prediction. Most queries for BQ ML will therefore fall under this type of job. An ML_EXTERNAL job applies to BQ ML queries that use external services such as Vertex AI and AutoML. Only external model reservations have this job type.

Cost Table

Type of commitmentNumber of slotsPricing
Flex100$4 per hour or $2920 per month


Flat-rate pricing has several limitations, including:

  • The inability to share reservations with other GCP organizations
  • An organization can only have a maximum of five projects with an active commitment in specific locations
  • Commitments are regional resources – you cannot move them between regions


The following are factors that allow you to create more realistic cost estimations:

  • Clear reservation allocation rules for complex organizations
  • An accurate estimate of how many slots are needed
  • Well-managed workloads across different reservations

To help you achieve these, we have provided several recommendations below.

Create an administration project solely for reservations

Google recommends creating an “administrative project” just for reservations, allowing you to centralize billing and management. Projects under the same organization as the administration project can use reservations. Additionally, these projects can share idle and unallocated slots, making slot management more flexible.  

Use Flex slots to get a slot estimate

Monitoring tools like Cloud Logging allow you to monitor the average capacity your workloads consume. It’s best practice to use a small number of Flex slots and increase their number as you test your workloads. Examine your logs to identify what number of slots provides the best performance-to-value ratio. 

Define workloads clearly 

Take advantage of reservations by defining workloads based on purpose or domain. Following the previously outlined methodology, estimate the number of slots required for each workstream.

For example, given 1000 committed slots and three functional areas, the assignment of resources could work as follows:

  • Assign five hundred slots to the Data Science team. They usually get the most slots because they will be doing the most intensive work
  • Assign three hundred slots for ETL (Extract-Transform-Load). This function is responsible for cleaning and transforming business data before it arrives at the Data Science and Business Intelligence teams
  • Assign two hundred slots to BI (Business Intelligence). BI creates reports and visualizations that provide meaningful insights, allowing stakeholders to make data-driven decisions
A comprehensive approach to hybrid cloud management

Only solution with automated discovery, testing, provisioning, security, and cost management

A `single pane`for infrastructure spanning on-premise, private cloud, and multiple public clouds

A comprehensive framework that extends your existing tool investments and fills the gaps


BQ ML is a powerful tool that allows the entire ML Ops lifecycle to exist within a single Data Warehouse solution. It democratizes data and enables Data Analysts without formal Machine Learning training to use SQL to create, evaluate, and deploy powerful industry-standard models. 

Regarding pricing, there are two usage patterns: on-demand and flat-rate pricing. On-demand charges on an “as-is” basis and is an efficient choice for one-off workloads. Flat-rate is ideal for enterprises that want predictability in their billing, especially if they run large numbers of monthly BQ ML queries. 

Flat-rate pricing works by purchasing committed slots, assigning slots to reservations, and distributing reservations amongst projects. A sound BQ ML pricing strategy depends on the number of monthly queries and the estimated number of slots required if the client chooses flat-rate pricing.

You Deserve Better Than Broadcom

Speak with a VMWare expert about your migration options today and discover how CloudBolt can transform your cloud journey.

Demand Better

Explore the chapters:

Related Blogs

The New FinOps Paradigm: Maximizing Cloud ROI

Featuring guest presenter Tracy Woo, Principal Analyst at Forrester Research In a world where 98% of enterprises are embracing FinOps,…

Maximizing Kubernetes ROI with Augmented FinOps

A joint webinar by CloudBolt and StormForge Are you struggling to control Kubernetes costs? Lacking visibility into container spend? FinOps…

What’s new in CloudBolt 2024.2

Cloud Cost Efficiency: Strategies to Optimize Rate and Usage

In the modern digital era, cloud computing has become an integral component of business operations, offering scalability, flexibility, and cost-efficiency.…