UNNESTing in BigQuery – a guided introduction

The UNNEST function in BigQuery is used to transform arrays or repeated fields within a table into individual rows. This is particularly useful when dealing with data stored in nested structures, such as arrays or structs, as it allows you to flatten and work with the data more easily.

This can be useful for a variety of purposes, such as:

  • Extracting individual elements from an array
  • Counting the number of elements in an array
  • Finding the minimum or maximum value in an array
  • Grouping data by the values in an array

Toy Examples

Example 1: Explode an array into separate rows

Suppose you have a table called orders with a column called items which contains an array of items for each order. You can use UNNEST to explode the array into separate rows:

SELECT 
    order_id, 
    item
FROM orders, 
    UNNEST(items) AS item

Example 2: Combine UNNEST with other functions

You can also combine UNNEST with other functions to perform more complex operations. For example, let’s say you have a table called users with a column called interests which contains an array of interests for each user. You can use UNNEST along with COUNT to count the number of users for each interest:

SELECT 
    interest, 
    COUNT(*) AS user_count
FROM users, 
    UNNEST(interests) AS interest
GROUP BY 
    interest

Toy Example Walkthrough

First create some data with an array field in and take a look

WITH toy_table AS (
SELECT "Marquin" AS user, ["bread", "grapes", "pringles", "pizza"] AS favourite_food
UNION ALL
SELECT "Jo", ["pizza", "banana"]
UNION ALL
SELECT "Sam", ["corn", "kiwi", "rice", 'jam', "chocolate"]
UNION ALL
SELECT "Peter", ["ice cream", "apple", "chocolate", "sausage", "pizza"]
)

SELECT *
FROM toy_table;

UNNEST use case: We want to find out which food item is most commonly on people’s favourite food list.

The table in its current form does not really allow for the counting of each item in the array.

What we need is for each item in the array to be represented in its own row. The query below achieves this.

WITH toy_table AS (
SELECT "Marquin" AS user, ["bread", "grapes", "pringles", "pizza"] AS favourite_food
UNION ALL
SELECT "Dejan", ["pizza", "banana"]
UNION ALL
SELECT "Heather", ["corn", "kiwi", "rice", 'jam', "chocolate"]
UNION ALL
SELECT "Peter", ["ice cream", "apple", "chocolate", "sausage", "pizza"]
)

SELECT
    user,
    items
FROM toy_table,
    UNNEST(favourite_food) AS items;

So now each favourite food is on its own row, it is able to be counted and grouped to find the most popular foods

WITH toy_table AS (
SELECT "Marquin" AS user, ["bread", "grapes", "pringles", "pizza"] AS favourite_food
UNION ALL
SELECT "Dejan", ["pizza", "banana"]
UNION ALL
SELECT "Heather", ["corn", "kiwi", "rice", 'jam', "chocolate"]
UNION ALL
SELECT "Peter", ["ice cream", "apple", "chocolate", "sausage", "pizza"]
),

item_on_each_row AS (
SELECT
    user,
    items
FROM toy_table,
    UNNEST(favourite_food) AS items
)

SELECT
    items,
    COUNT(*) AS times_as_favourite
FROM item_on_each_row
GROUP BY
    items
ORDER BY
    COUNT(*) DESC


Additional Resources

Working with Arrays. BigQuery official documentation

Send a Comment

Your email address will not be published.