GA4 15 min read

GA4 BigQuery Export: Setup Guide + 10 Useful Queries

How to link GA4 to BigQuery, what data gets exported, and 10 practical SQL queries for e-commerce analysis, funnel reporting, and cohort analysis that you can run today.

A
Aumlytics Team
·

If you’ve ever hit a wall with GA4’s standard reports — 14-month data retention, limited custom dimensions in the UI, no raw event access — BigQuery export is the solution. It’s the only way to get your raw, unsampled GA4 data into a format where you can query it however you need.

This guide walks through the full setup and gives you 10 production-ready SQL queries you can use immediately once your data starts flowing.


What Is the GA4 BigQuery Export?

GA4 can stream or batch-export your raw event data to Google BigQuery, Google’s fully-managed cloud data warehouse. Once connected, every event — every pageview, purchase, scroll, click — is available as a queryable row in BigQuery.

What you get that you don’t get in GA4 UI:

  • Raw event-level data (no sampling, ever)
  • Unlimited data retention (no 14-month cap)
  • Access to all parameters, including custom ones
  • SQL querying — join GA4 data with your CRM, Shopify data, or any other source
  • Historical data backfill for the current calendar day through streaming tables

What it costs: BigQuery has a generous free tier — 10 GB of storage free per month and the first 1 TB of queries free per month. For most small and mid-sized businesses, the cost is $0 until you scale significantly.

Official reference: GA4 BigQuery Export documentation


Prerequisites

Before linking, you need:

  1. A GA4 property (you must have Editor or Administrator role)
  2. A Google Cloud project — create one at console.cloud.google.com if you don’t have one
  3. BigQuery API enabled in that Google Cloud project
  4. Billing account linked to your Cloud project (required, but you won’t be charged if you stay within the free tier)

Step 1: Set Up Your Google Cloud Project

  1. Go to Google Cloud Console
  2. Click Select a projectNew Project
  3. Name it (e.g., aumlytics-analytics) → Create
  4. Navigate to APIs & ServicesLibrary → search “BigQuery API” → Enable
  5. BillingLink a billing account (required, but free tier covers most SMBs)

  1. Open your GA4 property
  2. Admin (gear icon) → Product LinksBigQuery Links
  3. Click Link
  4. Select your Google Cloud project from the dropdown
  5. Choose a BigQuery dataset location — pick the region closest to your users (e.g., us-central1 for North American users, europe-west2 for UK)
  6. Export type:
    • Daily export — one table per day, created after midnight
    • Streaming export — near real-time intraday table, updated continuously (recommended if you want fresh data)
  7. Select the data streams to include (select all unless you want to exclude a specific stream)
  8. Click Submit

Note: The link takes up to 24 hours to activate. The first daily export runs after the next midnight in your property’s timezone.


Step 3: Understand the BigQuery Table Structure

Once data flows, you’ll find a dataset named analytics_XXXXXXXXXX (where X is your GA4 property ID) in BigQuery.

Tables created:

Table nameContents
events_YYYYMMDDOne table per day (daily export)
events_intraday_YYYYMMDDCurrent day’s streaming data

Key columns in each table:

-- Core event columns
event_date          -- YYYYMMDD string (e.g., '20250101')
event_timestamp     -- Unix microseconds (divide by 1,000,000 for seconds)
event_name          -- e.g., 'page_view', 'purchase', 'scroll'
event_params        -- ARRAY<STRUCT<key, value>> — all event parameters

-- User/session columns
user_id             -- your app's user ID (if set_user_id was called)
user_pseudo_id      -- GA4's client ID (anonymised device identifier)
ga_session_id       -- session number (within event_params)
ga_session_number   -- how many sessions this user has had

-- Device/geo columns
device.category     -- 'desktop', 'mobile', 'tablet'
device.browser      -- 'Chrome', 'Safari', etc.
geo.country
geo.city
geo.region

-- Traffic source columns
traffic_source.source
traffic_source.medium
traffic_source.name  -- campaign name

The event_params column is the tricky part. It’s a repeated STRUCT array. To extract a specific parameter value:

-- Extract a single parameter value (string)
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_url

-- Extract a single parameter value (integer)
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id

-- Extract e-commerce value (float)
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value') AS purchase_value

Step 4: Run Your First Query

Open BigQuerySQL workspace and run this to verify data is flowing:

-- Check which dates have data
SELECT
  event_date,
  COUNT(*) AS event_count
FROM `your-project-id.analytics_XXXXXXXXXX.events_*`
WHERE _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
GROUP BY 1
ORDER BY 1 DESC;

Replace your-project-id with your Google Cloud project ID and XXXXXXXXXX with your GA4 property ID (visible in GA4 Admin → Property Settings).


10 Practical BigQuery Queries

Query 1: Daily Sessions and Pageviews

SELECT
  event_date,
  COUNT(DISTINCT CONCAT(user_pseudo_id, CAST(
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING
  ))) AS sessions,
  COUNTIF(event_name = 'page_view') AS pageviews,
  COUNT(DISTINCT user_pseudo_id) AS unique_users
FROM `your-project.analytics_XXXXXXXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
GROUP BY 1
ORDER BY 1;

What it shows: Day-by-day session and user counts — the equivalent of the Sessions overview in GA4 but with no sampling and exportable to any BI tool.


Query 2: Traffic Source Breakdown

SELECT
  traffic_source.source AS source,
  traffic_source.medium AS medium,
  COUNT(DISTINCT CONCAT(user_pseudo_id, CAST(
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING
  ))) AS sessions,
  COUNTIF(event_name = 'purchase') AS purchases,
  SUM(IF(event_name = 'purchase',
    (SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value'), 0
  )) AS revenue
FROM `your-project.analytics_XXXXXXXXXX.events_*`
WHERE _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
GROUP BY 1, 2
ORDER BY sessions DESC
LIMIT 20;

What it shows: Sessions, conversions, and revenue by source/medium — the Traffic Acquisition report with revenue breakout per source.


Query 3: E-commerce Purchase Report

SELECT
  event_date,
  COUNT(DISTINCT
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id')
  ) AS transactions,
  SUM(
    (SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')
  ) AS total_revenue,
  AVG(
    (SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')
  ) AS avg_order_value
FROM `your-project.analytics_XXXXXXXXXX.events_*`
WHERE
  event_name = 'purchase'
  AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
GROUP BY 1
ORDER BY 1;

What it shows: Daily transaction count, total revenue, and AOV — cross-reference against your Shopify/payment processor for discrepancy analysis.


Query 4: Top Pages by Engagement

SELECT
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_url,
  REGEXP_REPLACE(
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location'),
    r'https?://[^/]+', ''
  ) AS page_path,
  COUNT(*) AS pageviews,
  COUNT(DISTINCT user_pseudo_id) AS unique_users,
  AVG(
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec')
  ) / 1000 AS avg_engagement_seconds
FROM `your-project.analytics_XXXXXXXXXX.events_*`
WHERE
  event_name = 'page_view'
  AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
GROUP BY 1, 2
ORDER BY pageviews DESC
LIMIT 50;

What it shows: Top pages with average engagement time — helps identify which content drives the most interaction.


Query 5: Checkout Funnel Analysis

WITH funnel_events AS (
  SELECT
    user_pseudo_id,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
    MAX(IF(event_name = 'view_item', 1, 0)) AS viewed_product,
    MAX(IF(event_name = 'add_to_cart', 1, 0)) AS added_to_cart,
    MAX(IF(event_name = 'begin_checkout', 1, 0)) AS began_checkout,
    MAX(IF(event_name = 'purchase', 1, 0)) AS purchased
  FROM `your-project.analytics_XXXXXXXXXX.events_*`
  WHERE _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
  GROUP BY 1, 2
)

SELECT
  SUM(viewed_product) AS product_views,
  SUM(added_to_cart) AS add_to_carts,
  SUM(began_checkout) AS checkouts_started,
  SUM(purchased) AS purchases,
  ROUND(100 * SUM(added_to_cart) / NULLIF(SUM(viewed_product), 0), 1) AS atc_rate,
  ROUND(100 * SUM(began_checkout) / NULLIF(SUM(added_to_cart), 0), 1) AS checkout_rate,
  ROUND(100 * SUM(purchased) / NULLIF(SUM(began_checkout), 0), 1) AS purchase_rate
FROM funnel_events;

What it shows: The e-commerce funnel from product view → cart → checkout → purchase, with conversion rates at each step. This is where you find where customers drop off.


Query 6: User Cohort Retention (30-Day)

WITH first_visit AS (
  SELECT
    user_pseudo_id,
    MIN(PARSE_DATE('%Y%m%d', event_date)) AS cohort_date
  FROM `your-project.analytics_XXXXXXXXXX.events_*`
  WHERE _TABLE_SUFFIX >= '20250101'
  GROUP BY 1
),

activity AS (
  SELECT
    e.user_pseudo_id,
    PARSE_DATE('%Y%m%d', e.event_date) AS activity_date,
    f.cohort_date
  FROM `your-project.analytics_XXXXXXXXXX.events_*` e
  JOIN first_visit f USING (user_pseudo_id)
  WHERE e._TABLE_SUFFIX >= '20250101'
)

SELECT
  cohort_date,
  DATE_DIFF(activity_date, cohort_date, DAY) AS days_since_first_visit,
  COUNT(DISTINCT user_pseudo_id) AS active_users
FROM activity
WHERE DATE_DIFF(activity_date, cohort_date, DAY) IN (0, 7, 14, 30)
GROUP BY 1, 2
ORDER BY 1, 2;

What it shows: How many users from each cohort (acquisition date) returned at 7, 14, and 30 days. Low 30-day retention = content or product engagement problem.


Query 7: Revenue by Device Category

SELECT
  device.category AS device_type,
  device.operating_system AS os,
  COUNT(DISTINCT
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id')
  ) AS transactions,
  SUM(
    (SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')
  ) AS revenue,
  ROUND(100 * SUM(
    (SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')
  ) / SUM(SUM(
    (SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')
  )) OVER (), 1) AS revenue_pct
FROM `your-project.analytics_XXXXXXXXXX.events_*`
WHERE
  event_name = 'purchase'
  AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY))
GROUP BY 1, 2
ORDER BY revenue DESC;

What it shows: Revenue split by device type and OS — informs mobile vs desktop optimisation priorities.


Query 8: Landing Page Performance

WITH sessions AS (
  SELECT
    user_pseudo_id,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
    MIN(event_timestamp) AS session_start,
    FIRST_VALUE(
      (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location')
    ) OVER (
      PARTITION BY user_pseudo_id,
        (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
      ORDER BY event_timestamp
    ) AS landing_page
  FROM `your-project.analytics_XXXXXXXXXX.events_*`
  WHERE
    event_name = 'page_view'
    AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
),

purchases AS (
  SELECT
    user_pseudo_id,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
    (SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value') AS revenue
  FROM `your-project.analytics_XXXXXXXXXX.events_*`
  WHERE
    event_name = 'purchase'
    AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
)

SELECT
  REGEXP_REPLACE(s.landing_page, r'https?://[^/]+', '') AS landing_path,
  COUNT(DISTINCT CONCAT(s.user_pseudo_id, CAST(s.session_id AS STRING))) AS sessions,
  COUNTIF(p.revenue IS NOT NULL) AS conversions,
  ROUND(100 * COUNTIF(p.revenue IS NOT NULL) / COUNT(DISTINCT CONCAT(s.user_pseudo_id, CAST(s.session_id AS STRING))), 2) AS conversion_rate,
  ROUND(SUM(COALESCE(p.revenue, 0)), 2) AS total_revenue
FROM sessions s
LEFT JOIN purchases p ON s.user_pseudo_id = p.user_pseudo_id AND s.session_id = p.session_id
GROUP BY 1
HAVING sessions > 50
ORDER BY total_revenue DESC
LIMIT 30;

What it shows: Which landing pages generate the most revenue — critical for SEO and ad landing page decisions.


Query 9: Scroll Depth Analysis

SELECT
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_url,
  REGEXP_REPLACE(
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location'),
    r'https?://[^/]+', ''
  ) AS page_path,
  AVG(
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'percent_scrolled')
  ) AS avg_scroll_pct,
  COUNTIF(
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'percent_scrolled') >= 90
  ) AS users_read_90pct,
  COUNT(*) AS total_scroll_events
FROM `your-project.analytics_XXXXXXXXXX.events_*`
WHERE
  event_name = 'scroll'
  AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
GROUP BY 1, 2
HAVING total_scroll_events > 20
ORDER BY avg_scroll_pct DESC;

What it shows: How far users scroll on each page — blog posts with low average scroll depth need better hooks or shorter content.


Query 10: New vs Returning User Revenue

SELECT
  CASE
    WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number') = 1
    THEN 'New User'
    ELSE 'Returning User'
  END AS user_type,
  COUNT(*) AS purchases,
  SUM(
    (SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')
  ) AS revenue,
  AVG(
    (SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')
  ) AS avg_order_value,
  COUNT(DISTINCT user_pseudo_id) AS unique_buyers
FROM `your-project.analytics_XXXXXXXXXX.events_*`
WHERE
  event_name = 'purchase'
  AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY))
GROUP BY 1;

What it shows: Revenue split between new and returning buyers — if returning users have dramatically higher AOV, that informs your retention marketing investment.


Optimising BigQuery Costs

BigQuery charges based on data scanned per query. A few practices keep costs near zero:

1. Always use _TABLE_SUFFIX filters with wildcard tables:

-- ❌ Scans ALL data (expensive)
FROM `project.analytics_XXXXXXXXXX.events_*`

-- ✅ Scans only last 30 days
FROM `project.analytics_XXXXXXXXXX.events_*`
WHERE _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))

2. Select only the columns you need — BigQuery charges per column scanned.

3. Use partitioned tables — GA4 tables are already date-partitioned by _TABLE_SUFFIX.

4. Cache query results — BigQuery caches results for 24 hours. Rerunning the same query is free.

5. Monitor your usage: Cloud Console → BigQuery → Monitoring → view bytes processed per query.


Connecting BigQuery to BI Tools

Once your data is in BigQuery, you can connect it to visualisation tools:

Looker Studio (free):

  • Looker Studio → Create → Data source → BigQuery → select your table
  • Build dashboards without writing SQL — drag and drop from raw event data

Tableau / Power BI:

  • Both have native BigQuery connectors
  • Connect via service account credentials from Google Cloud IAM

dbt (for data teams):

  • Use dbt to build materialised views and transform raw GA4 events into clean reporting tables
  • The GA4 dbt package from Fivetran is a popular starting point

Common BigQuery Export Mistakes

Mistake 1: Querying without date filters Running SELECT * on events_* without a _TABLE_SUFFIX filter scans your entire history. On a large property, this can cost hundreds of dollars. Always filter dates.

Mistake 2: Forgetting UNNEST for nested arrays GA4’s event_params is an array. You must use UNNEST() or (SELECT ... FROM UNNEST(event_params) WHERE key = '...') to extract individual values. Directly referencing event_params.key returns an error.

Mistake 3: Mixing INT and FLOAT event_params GA4 stores parameter values in typed union fields: string_value, int_value, float_value, double_value. The wrong type returns NULL. Check which type a parameter uses with:

SELECT
  key,
  value.string_value,
  value.int_value,
  value.float_value,
  value.double_value
FROM `project.analytics_XXXXXXXXXX.events_20250101`,
UNNEST(event_params)
LIMIT 100;

Mistake 4: Counting sessions incorrectly There’s no native “session ID” column — sessions are identified by the combination of user_pseudo_id + ga_session_id (extracted from event_params). Always use CONCAT(user_pseudo_id, CAST(session_id AS STRING)) for distinct session counts.


When to Enable BigQuery Export

You should connect GA4 to BigQuery if any of these apply:

  • You run Google Ads and want unsampled data for attribution analysis
  • You need more than 14 months of historical data
  • You want to join GA4 data with Shopify, Amazon, or CRM data
  • You build custom dashboards in Looker Studio, Tableau, or Power BI
  • You’re an e-commerce store where revenue accuracy matters

For a GA4 property with under 100,000 events/day, BigQuery export costs will typically be $0–5/month — well within the free tier for queries if you use date filters consistently.


Need help setting up your GA4 BigQuery export or building custom dashboards from your raw data? We set up GA4 → BigQuery pipelines and build Looker Studio dashboards for e-commerce stores and SaaS companies. Book a free consultation.

#ga4#bigquery#google-analytics#sql#ecommerce#data-analysis#google-cloud

Want This Implemented Correctly?

Let our team apply these concepts to your specific setup — with QA validation and 30 days of support.