Understand the Structure of the GA4 Export in BigQuery

Published:

Written by:

As we journey deeper into the world of GA4 within BigQuery, it’s crucial to have a solid grasp of the structure that underlies this vast ocean of data. In this tutorial, we’ll dive into the architecture of the GA4 export, unraveling its table schema and data organization. This foundational knowledge will pave the way for more effective querying and analysis in subsequent tutorials.

This is part 3 of 10 in our series, “Get Started with the GA4 BigQuery Export”. Check out the course overview for a full list of all the topics covered.

Introduction

Before getting into the fun of exploring the GA4 BigQuery export it’s useful to have an understanding of the table schema and how the data is structured within.

In this tutorial, we’ll delve into the structure of these tables, exploring the schema and the data they contain. While some fields are straightforward columns, others are records nested within records, offering deeper insights.

We’ll also encounter REPEATED fields, which add another layer of complexity. Fear not; we’ll explain how to navigate these nuances, providing a hands-on experience with a simple query.

The analytics dataset

Once the export is enabled in your property you will see a dataset appear in your BigQuery workspace named analytics_ followed by a string of numbers. This dataset is unique to your GA4 property and is only accessible via your Google Cloud instance.

Within the dataset a new table will be created each day that contains all the data collected on the previous date. The daily tables are named events_ followed by the date in the format “YYYYMMDD”, e.g. events_20230815.

The BigQuery interface doesn’t list all these tables out separately though. Instead it just shows one table named events_ with a number in brackets that represents the number of tables that exist. E.g. in the example above there are 114 separate daily tables within this dataset.

If you have enabled the Streaming export then you will see another item named events_intraday_ followed by the date e.g. events_intraday_20230815. The intraday table is populated with data in real time (or near-real time at least) but gets deleted on a daily basis.

Sometimes you’ll see 2 intraday tables (one for the current day and one for the previous day) but at some point during the day the previous day’s table will be deleted. The structure of the events_ and events_intraday_ tables are the same, as is the data collected within.

Table structure

Let’s take a look now at the structure of the data. To do so, navigate to the e-commerce sample dataset and click on the events_ table to open it.

Note: refer to part 2 of the series if you are unsure how to access the sample dataset.

An overview of the most recent events table will open in the main view. Towards the top there is a dropdown menu from which you can select a different date to view the table for that particular date. Underneath is an overview of the table schema, giving a breakdown of all the fields (or columns) and their datatypes.

You will see that some of the fields have a datatype RECORD and some of these are REPEATED. A RECORD is a field with additional columns nested within, while a REPEATED field can contain multiple rows inside one row.

Working with a RECORD is pretty straightforward, you just need to use ‘dot notation’ to access the nested columns, e.g. device.category or geo.country. REPEATED fields are a bit more tricky though. For these you’ll need to use the UNNEST command or a subquery.

Don’t worry if you are unfamiliar with these concepts though, we’ll cover each one in a subsequent tutorial. For now let’s just run a simple query to see what the data looks like. Click the [+] icon to open up a new tab and then run the following query:

SELECT *
FROM `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131`
WHERE event_name = 'add_to_cart'
AND ecommerce.unique_items > 1
LIMIT 10

Let’s break this query down to understand what’s going on:

  • Line 1 specifies which columns to select – * means we want to return all columns
  • Line 2 indicates which table to query
    • bigquery-public-data is the name of the project
    • ga4_obfuscated_sample_ecommerce is the dataset
    • events_20210131 is the table name
  • Line 3 & 4 are filtering the data to certain attributes – where the event_name is add_to_cart and unique_items is greater than 1
  • Line 5 limits the data to return only 10 rows

Important Note: be careful when using SELECT * in your queries as you might inadvertently end up running a costly query against a high volume of data. If in doubt, check the message in the top right of the query window to see how much data you will be processing.

Remember, with BigQuery you get 1 TB of free queries every month (thats 1,000 GB or 1,000,000 MB) so for many people it’s not something you’ll have to worry too much about, but definitely something to be aware of!

Scroll to view the event_params field and you should see something like the below. From this we can see that the field is made up of 5 columns nested within and multiple parameters all nested within 1 row.

Now scroll to the right until you find the items field. Again you will see something like the below, with multiple fields nested within 1 column and multiple items all nested within 1 row.

If you want to explore the schema further then check out the following pages for full details of every field along with example values.

Wrapping up

In this tutorial, we’ve taken our first step toward understanding the GA4 BigQuery export by examining its table schema and data structure. Armed with this foundational knowledge, you’re now better prepared to explore and analyze your GA4 data effectively.

As you progress through this series, remember that a solid grasp of the GA4 data structure is your key to unlocking its full potential. Stay tuned for the next tutorial where we’ll delve into the GA4 event data, understand the types of events you might encounter and perform some queries to gain insights into user interactions.

Next >> 4. Exploring GA4 event data in BigQuery

New to the GA4 BigQuery export? Check out our free introductory guide.

X