GA4 BigQuery Export – The Ultimate Introductory Guide

Are you interested in learning about the GA4 BigQuery export, but unsure where to begin? In this article, we discuss its benefits, potential expenses, and provide a step-by-step guide on how to use it to gain valuable insights into your web and app users.

What is the GA4 BigQuery export?

One of Google Analytics (GA4) best features is the ability to export all the raw data into BigQuery. Previously only available to those with an enterprise Analytics 360 license, the GA4 BigQuery export is now free to everyone.

BigQuery is the data warehousing capability built into Google Cloud Platform (GCP) that provides cost-effective cloud-based data storage and analytics. The GA4 BigQuery export is a built-in integration that automatically exports all your event-level GA4 data and stores it in BigQuery for analysis and reporting.

Each GA4 property connected to BigQuery can export up to 1 million events per day free of charge. This limit should prove more than adequate for most small and medium-sized businesses. For those who need to, you can export more events by acquiring a GA 360 license, but you also have the flexibility to reduce export volumes by excluding certain events if you wish to.

What are the benefits of the GA4 BigQuery export?

Having your GA4 data stored in BigQuery unlocks a whole host of opportunities not available through the GA4 UI. Some of the things you can do with it include:

  • Build custom reports: Storing your data in BigQuery allows you to build customised reports tailored to your needs. This flexibility empowers you to focus on the metrics and insights that matter most, facilitating more informed decision-making and unlocking more value from the data.

  • Get around API quotas: Querying the data in BigQuery enables you to circumvent API quotas, such as those applicable when using tools like Looker Studio. Doing so opens up opportunities for more extensive and frequent data extraction, enhancing the efficiency of analytics workflows.

  • Increase data retention limits: GA4 has a maximum data retention period of 14 months. By leveraging the GA4 BigQuery export, you can extend the data retention indefinitely, allowing for a more comprehensive analysis of historical trends and patterns.

  • Enrich data with other sources: Exporting your GA4 data to BigQuery allows you to enrich it with data from other sources. That could be customer data from your CRM, sales data from your e-commerce platform, media data from Google or Meta, or weather data from an external service. The opportunities are endless.

  • Apply changes to historical data: GA4 does not offer any opportunity to modify data, which can be problematic for things like channel groupings where changes don’t apply retroactively. With BigQuery, you have complete flexibility to make changes and have them apply to all historical and future data collected.

  • Remove the need to set up custom definitions: GA4 forces you to set up custom dimensions to report on any event parameters collected by the property. Failing to do so in time means you will not have access to the data. With the GA4 BigQuery export, you can extract any event parameter collected at any time, meaning you can forget about custom dimensions altogether.

  • Avoid cardinality issues: When dealing with a dimension with many unique values, you may encounter a row limit in your reports, which could cause some values to be grouped under the (other) row. However, BigQuery has no limitations, enabling you to produce accurate reports on the complete range of dimension values.

And so much more. The versatility of the exported data extends far beyond the points listed above. You have complete freedom to utilise it for any use case you can imagine.

How much does the GA4 BigQuery export cost?

While it is free to enable the BigQuery integration in GA4, there are some potential costs that you should be aware of. You can divide these costs into three separate components:

BigQuery storage costs

The amount of data stored in BigQuery is critical in determining costs. The more data your property collects and the richer the information captured, the more significant the impact on storage costs.

How much will you pay? The first 10 GB of storage is free. After this, you will pay approximately $0.01 or $0.02 per GB, per month.

BigQuery compute costs

The computational resources utilised during query processing can also contribute to the overall cost. This factor depends on the complexity and volume of queries run on the stored data.

How much will you pay? The first 1 TB per month is free. After this, you’ll pay $6.25 per TB of data processed.

BigQuery ingestion costs

Another component is the cost of ingesting data into BigQuery. This aspect encompasses the process of loading data into the system. The frequency and volume of data ingestion will influence this cost component.

How much will you pay? $0.05 per GB (but this only applies to the streaming export).

Estimating potential costs

The size of the data you collect and store will be crucial in determining how much you pay. Numerous factors will influence data volumes:

  • The number of events: The total number of events collected by your GA4 property directly affects the storage requirements. A higher volume of events translates to increased storage needs.

  • Event parameters & user properties: The complexity of each event, determined by the number of parameters and associated user properties, will impact the size of the data.

  • E-commerce items: For businesses involved in e-commerce, the number of items associated with each e-commerce event is a crucial consideration.

  • Populated fields: The more populated fields for each event, the larger the data will be. Null values, on the other hand, don’t take up any space.

  • Size of Values Collected: The length of the values collected, such as URLs or product names, will also affect storage costs. Larger values consume more storage space.

Understanding potential costs is essential for any business considering utilising the GA4 BigQuery export. This BigQuery cost calculator will provide you with an estimate of storage costs based on the number of daily events your property collects.

You might be surprised to learn that a website collecting 100,000 events per day could expect to pay less than $2 for storage in the first year. Many users are likely to pay nothing at all.

How can you get started with the GA4 BigQuery export?

Ready to embark on your journey with the GA4 BigQuery Export? Here are some valuable resources to help you get started…

Enable the export for your property

Before diving into the BigQuery data, you’ll want to enable the export in your GA4 property. This foundational step sets the stage for harnessing the full analytical potential of your GA4 data within BigQuery.

Don’t have your own data? Learn how to connect to the sample GA4 datasets instead.

Setting up the export is really quite straightforward, here’s a high-level overview of the steps involved:

  1. If you haven’t already done so, sign-up for a Google Cloud Platform (GCP) account.
  2. Setup a new GCP project, then enable billing and turn on the BigQuery API.
  3. Head over to the admin area in GA4 and start the linking process.
  4. Select your GCP project and choose a data location.
  5. Configure the data stream(s) and events you want to export.
  6. Choose the export types you want to enable (Daily, Streaming and/or User).
  7. Wait for your data to appear in BigQuery!

For a full overview of the whole process, check out this quick video guide:

Understand the data structure

Once you have access to some data, you’ll want to familiarise yourself with the BigQuery table structure. Understanding the schema and organisation of the data is crucial for crafting effective queries and extracting meaningful insights.

You’ll discover a new table containing the previous day’s data is created within BigQuery each day. Within these tables, each row represents an event collected by Google Analytics.

Many of the data types in the export, such as STRING and INTEGER will likely be familiar to you, whilst others like RECORD and REPEATED might be new concepts. To extract the data from these fields you’ll need to master the UNNEST function.

Some of the key fields you’ll be working with include:

  • event_name: the name of the event
  • event_date: the date of the event
  • event_timestamp: the time (in microseconds) of the event
  • event_params: event parameters collected with the event
  • user_pseudo_id: the pseudonymous ID of the user
  • user_id: the unique ID assigned to the user
  • platform: the type of data stream (iOS, Android or Web)
  • device.category: the type of device used (mobile, desktop or tablet)
  • device.web_info.browser: the browser used
  • geo.country: the country where the user was located
  • geo.city: the city where the user was located
  • collected_traffic_source.manual_source: utm_source
  • collected_traffic_source.manual_medium: utm_medium
  • collected_traffic_source.manual_campaign_name: utm_campaign

The following video provides a good introduction to the export structure:

Learn how to extract event parameters

To delve deeper into the details of your events, you’ll need to learn the trick to extracting event parameters. This technique allows for more granular analysis, enabling you to uncover insights hidden within the rich event data captured by GA4.

It’s important to first understand which parameters you have in your data. You can use the following query to extract a list of parameters and the count of occurrences.

SELECT
  ep.key AS event_parameter,
  COUNT(*) AS occurrences
FROM `<project>.<dataset>.<table>`, UNNEST(event_params) AS ep
GROUP BY 1
ORDER BY 2 DESC

To extract a specific parameter value, we’ll employ a subquery technique within our query. Here we are extracting the value.string_value from the unnested event_params field, filtering for cases where the key matches page_location.

SELECT
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location
FROM `<project>.<dataset>.<table>`
WHERE event_name = 'page_view'

Here’s a video with more detail on the above techniques:

Learn how to query multiple tables

To work with data across various dates, you’ll have to utilise the ‘*’ wildcard to query multiple tables simultaneously. The _TABLE_SUFFIX parameter allows you to query a specific range of dates.

Watch the following video to learn more:

Learn how to recreate key metrics

Once you are comfortable working with the BigQuery data you’ll be able to recreate key metrics from GA4. This skill empowers you to replicate essential reports, providing a bridge between the familiar GA4 interface and the analytical capabilities offered by BigQuery.

Calculate total users

Each user tracked by GA4 is assigned a unique pseudonymous ID, which is stored in the user_pseudo_id field. So to calculate Total Users, we need to do a count of distinct user_pseudo_id:

SELECT
  COUNT(DISTINCT user_pseudo_id) AS total_users
FROM `<project>.<dataset>.<table>`

Calculate sessions

To calculate Sessions we combine the user_pseudo_id with the ga_session_id. This concatenation allows us to identify and count distinct session interactions effectively:

SELECT
  COUNT(DISTINCT CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))) AS sessions
FROM `<project>.<dataset>.<table>`

Calculate views

The Views metric represents both page views and screen views. So to calculate this metric we need to do a count of total events where event_name = page_view or screen_view:

SELECT
  COUNTIF(event_name IN ('page_view', 'screen_view')) AS views
FROM `<project>.<dataset>.<table>`

These skills will provide a solid foundation for navigating the GA4 BigQuery export. As you embark on your analytical adventure, incorporating these strategies will enhance your ability to extract valuable insights and make informed decisions based on the wealth of data.

Why doesn’t the data match up to GA4?

As you begin to replicate GA4 metrics and reports in BigQuery, you may encounter a discrepancy between the BigQuery data and the GA4 UI. Some possible reasons for this include:

  • Reporting Identity set to observed or blended: Differences may arise if the reporting identity in your GA4 property is set to ‘Observed’ or ‘Blended’. This setting will influence how the metrics in GA4 are calculated. Set the reporting identity to ‘Device-based’ for a more accurate comparison.

  • Sampling applied to GA4 reports: GA4 reports may be subject to sampling, especially when dealing with large datasets. This can result in variances between sampled GA4 reports and the unsampled data available in BigQuery.

  • Timezone settings: The BigQuery data includes both ‘event_date’ and ‘event_timestamp’ fields. It’s important to note that ‘event_date’ is based on the property’s time zone setting, while ‘event_timestamp’ is recorded in Coordinated Universal Time (UTC). Mismatched time zones can contribute to differences in reported data.

  • Approximations in GA4 metrics: Some GA4 metrics are approximations, particularly those employing the HyperLogLog++ algorithm. These approximations are designed to balance accuracy and resource efficiency but may lead to slight variations compared to BigQuery. Understand the nature of these metrics to contextualise any disparities.

Achieving an exact match between the GA4 UI and the BigQuery data is often challenging due to the inherent differences in data processing and reporting methodologies. Accept that minor discrepancies may exist and focus on trends and patterns rather than expecting precise parity in every metric.

Navigating the nuances of data matching between GA4 and BigQuery requires a nuanced understanding of the settings, algorithms, and time-related factors involved. By being aware of these considerations, you can better interpret and reconcile differences in data across the two platforms.

Want to learn more about the GA4 BigQuery export?

The GA4 BigQuery export is a powerful tool that unlocks endless possibilities for analysing and reporting on your GA4 data.

By exporting your data to BigQuery, you open up the possibility of customised reports, increased data retention limits, the ability to enrich your data with other sources, and so much more. While there may be potential costs associated with the integration, the benefits far outweigh them.

If you want a guided introduction, we have prepared a free beginners’ course to help you get started with the GA4 BigQuery export. The course covers everything you need to kickstart your journey, from setting up the integration to exploring the data structure, querying events, and recreating metrics and reports.

The whole series of tutorials is also available over on our YouTube channel.

Happy learning!

New to the GA4 BigQuery export? Check out our free introductory guide.

X