A previous post showed how to set a ‘daily query usage‘ quota in BigQuery, and what that quota should be to stay within the free tier of BigQuery.
Well, what if your BigQuery query usage exceeds the free tier? How should you decide what is a sensible quota? This post will help you figure out that number.
TL;DR
Copy and paste the following SQL code into the BigQuery editor to have it spit out a recommended quota level.
NB change the following lines with the project and dataset details:
- 26
- 58
- 65
Change line 35 to the appropriate BigQuery region.

/*
A SQL query to calculate an estimate for a resonable
'daily query usage' quota in BigQuery.
Defaults to free tier threshold is predicted value is smaller.
Uses most recent period data to predict future values.
The maximum of the predicted values is then chosen and multiplied
by a 'buffer_multiplier' to provide some buffer.
*/
-- the number of days to build the history from
DECLARE days_of_history DEFAULT 60;
-- the factor by which the buffer for
DECLARE buffer_multiplier DEFAULT 2;
-- delete the model once script finishes
DECLARE drop_model BOOLEAN DEFAULT TRUE;
-- create a time series model that is capable of predicting
-- future daily tebibytes used by bigquery
CREATE OR REPLACE MODEL `<PROJECT>.<DATASET>.tebibytesbilledmodel`
OPTIONS(MODEL_TYPE='ARIMA_PLUS',
time_series_timestamp_col='day',
time_series_data_col='total_tebi_billed') AS
SELECT
DATE(creation_time) AS day,
SUM(total_bytes_billed) / POW(1024, 4) AS total_tebi_billed
FROM
-- **CHANGE THE BELOW LINE TO THE REGION YOUR DATA IS IN**
`region-us-central1`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
--uses the days of history as the number of days to create the model
creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL days_of_history DAY) AND CURRENT_TIMESTAMP()
AND job_type = "QUERY"
GROUP BY
day
ORDER BY
day DESC;
SELECT
-- get the largest predicted value and scale it up by the 'buffer_multiplier'
-- returns the free tier quota limit if it is larger.
ROUND(
CASE
WHEN MAX(forecast_value) * buffer_multiplier < 0.03 THEN 0.03
ELSE MAX(forecast_value) * buffer_multiplier
END,
4
) AS recommended_daily_query_usage_quota,
FROM
ML.FORECAST(MODEL `<PROJECT>.<DATASET>.tebibytesbilledmodel`,
STRUCT(30 AS horizon, 0.8 AS confidence_level));
-- clean up
IF drop_model THEN
DROP MODEL`<PROJECT>.<DATASET>.tebibytesbilledmodel`;
END IF;
In detail
The above SQL code creates a time series model based on the last 60 days of data to predict the daily amount of Tebibytes used in queries.
-- create a time series model that is capable of predicting
-- future daily tebibytes used by bigquery
CREATE OR REPLACE MODEL `<PROJECT>.<DATASET>.tebibytesbilledmodel`
OPTIONS(MODEL_TYPE='ARIMA_PLUS',
time_series_timestamp_col='day',
time_series_data_col='total_tebi_billed') AS
SELECT
DATE(creation_time) AS day,
SUM(total_bytes_billed) / POW(1024, 4) AS total_tebi_billed
FROM
`region-us-central1`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
--uses the days of history as the number of days to create the model
creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL days_of_history DAY) AND CURRENT_TIMESTAMP()
AND job_type = "QUERY"
GROUP BY
day
ORDER BY
day DESC;
This model is then used to predict daily tebibytes used for 30 days into the future.
The highest predicted daily usage in tebibytes is then selected, and multiplied by the buffer_multiplier to give a value that is twice (default) the largest predicted daily usage figure.
SELECT
-- get the largest predicted value and scale it up by the 'buffer_multiplier'
-- returns the free tier quota limit if it is larger.
ROUND(
CASE
WHEN MAX(forecast_value) * buffer_multiplier < 0.03 THEN MAX(forecast_value) * buffer_multiplier
ELSE MAX(forecast_value) * buffer_multiplier
END,
4
) AS recommended_daily_query_usage_quota,
FROM
ML.FORECAST(MODEL `<PROJECT>.<DATASET>.tebibytesbilledmodel`,
STRUCT(30 AS horizon, 0.8 AS confidence_level));
The final block deletes the time series model that was created
-- clean up
IF drop_model THEN
DROP MODEL`<PROJECT>.<DATASET>.tebibytesbilledmodel`;
END IF;