If you’ve ever hit a wall with GA4’s standard reports — 14-month data retention, limited custom dimensions in the UI, no raw event access — BigQuery export is the solution. It’s the only way to get your raw, unsampled GA4 data into a format where you can query it however you need.
This guide walks through the full setup and gives you 10 production-ready SQL queries you can use immediately once your data starts flowing.
What Is the GA4 BigQuery Export?
GA4 can stream or batch-export your raw event data to Google BigQuery, Google’s fully-managed cloud data warehouse. Once connected, every event — every pageview, purchase, scroll, click — is available as a queryable row in BigQuery.
What you get that you don’t get in GA4 UI:
- Raw event-level data (no sampling, ever)
- Unlimited data retention (no 14-month cap)
- Access to all parameters, including custom ones
- SQL querying — join GA4 data with your CRM, Shopify data, or any other source
- Historical data backfill for the current calendar day through streaming tables
What it costs: BigQuery has a generous free tier — 10 GB of storage free per month and the first 1 TB of queries free per month. For most small and mid-sized businesses, the cost is $0 until you scale significantly.
Official reference: GA4 BigQuery Export documentation
Prerequisites
Before linking, you need:
- A GA4 property (you must have Editor or Administrator role)
- A Google Cloud project — create one at console.cloud.google.com if you don’t have one
- BigQuery API enabled in that Google Cloud project
- Billing account linked to your Cloud project (required, but you won’t be charged if you stay within the free tier)
Step 1: Set Up Your Google Cloud Project
- Go to Google Cloud Console
- Click Select a project → New Project
- Name it (e.g.,
aumlytics-analytics) → Create - Navigate to APIs & Services → Library → search “BigQuery API” → Enable
- Billing → Link a billing account (required, but free tier covers most SMBs)
Step 2: Link GA4 to BigQuery
- Open your GA4 property
- Admin (gear icon) → Product Links → BigQuery Links
- Click Link
- Select your Google Cloud project from the dropdown
- Choose a BigQuery dataset location — pick the region closest to your users (e.g.,
us-central1for North American users,europe-west2for UK) - Export type:
- Daily export — one table per day, created after midnight
- Streaming export — near real-time intraday table, updated continuously (recommended if you want fresh data)
- Select the data streams to include (select all unless you want to exclude a specific stream)
- Click Submit
Note: The link takes up to 24 hours to activate. The first daily export runs after the next midnight in your property’s timezone.
Step 3: Understand the BigQuery Table Structure
Once data flows, you’ll find a dataset named analytics_XXXXXXXXXX (where X is your GA4 property ID) in BigQuery.
Tables created:
| Table name | Contents |
|---|---|
events_YYYYMMDD | One table per day (daily export) |
events_intraday_YYYYMMDD | Current day’s streaming data |
Key columns in each table:
-- Core event columns
event_date -- YYYYMMDD string (e.g., '20250101')
event_timestamp -- Unix microseconds (divide by 1,000,000 for seconds)
event_name -- e.g., 'page_view', 'purchase', 'scroll'
event_params -- ARRAY<STRUCT<key, value>> — all event parameters
-- User/session columns
user_id -- your app's user ID (if set_user_id was called)
user_pseudo_id -- GA4's client ID (anonymised device identifier)
ga_session_id -- session number (within event_params)
ga_session_number -- how many sessions this user has had
-- Device/geo columns
device.category -- 'desktop', 'mobile', 'tablet'
device.browser -- 'Chrome', 'Safari', etc.
geo.country
geo.city
geo.region
-- Traffic source columns
traffic_source.source
traffic_source.medium
traffic_source.name -- campaign name
The event_params column is the tricky part. It’s a repeated STRUCT array. To extract a specific parameter value:
-- Extract a single parameter value (string)
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_url
-- Extract a single parameter value (integer)
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id
-- Extract e-commerce value (float)
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value') AS purchase_value
Step 4: Run Your First Query
Open BigQuery → SQL workspace and run this to verify data is flowing:
-- Check which dates have data
SELECT
event_date,
COUNT(*) AS event_count
FROM `your-project-id.analytics_XXXXXXXXXX.events_*`
WHERE _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY))
GROUP BY 1
ORDER BY 1 DESC;
Replace your-project-id with your Google Cloud project ID and XXXXXXXXXX with your GA4 property ID (visible in GA4 Admin → Property Settings).
10 Practical BigQuery Queries
Query 1: Daily Sessions and Pageviews
SELECT
event_date,
COUNT(DISTINCT CONCAT(user_pseudo_id, CAST(
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING
))) AS sessions,
COUNTIF(event_name = 'page_view') AS pageviews,
COUNT(DISTINCT user_pseudo_id) AS unique_users
FROM `your-project.analytics_XXXXXXXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20250101' AND '20250131'
GROUP BY 1
ORDER BY 1;
What it shows: Day-by-day session and user counts — the equivalent of the Sessions overview in GA4 but with no sampling and exportable to any BI tool.
Query 2: Traffic Source Breakdown
SELECT
traffic_source.source AS source,
traffic_source.medium AS medium,
COUNT(DISTINCT CONCAT(user_pseudo_id, CAST(
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING
))) AS sessions,
COUNTIF(event_name = 'purchase') AS purchases,
SUM(IF(event_name = 'purchase',
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value'), 0
)) AS revenue
FROM `your-project.analytics_XXXXXXXXXX.events_*`
WHERE _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
GROUP BY 1, 2
ORDER BY sessions DESC
LIMIT 20;
What it shows: Sessions, conversions, and revenue by source/medium — the Traffic Acquisition report with revenue breakout per source.
Query 3: E-commerce Purchase Report
SELECT
event_date,
COUNT(DISTINCT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id')
) AS transactions,
SUM(
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')
) AS total_revenue,
AVG(
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')
) AS avg_order_value
FROM `your-project.analytics_XXXXXXXXXX.events_*`
WHERE
event_name = 'purchase'
AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
GROUP BY 1
ORDER BY 1;
What it shows: Daily transaction count, total revenue, and AOV — cross-reference against your Shopify/payment processor for discrepancy analysis.
Query 4: Top Pages by Engagement
SELECT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_url,
REGEXP_REPLACE(
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location'),
r'https?://[^/]+', ''
) AS page_path,
COUNT(*) AS pageviews,
COUNT(DISTINCT user_pseudo_id) AS unique_users,
AVG(
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec')
) / 1000 AS avg_engagement_seconds
FROM `your-project.analytics_XXXXXXXXXX.events_*`
WHERE
event_name = 'page_view'
AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
GROUP BY 1, 2
ORDER BY pageviews DESC
LIMIT 50;
What it shows: Top pages with average engagement time — helps identify which content drives the most interaction.
Query 5: Checkout Funnel Analysis
WITH funnel_events AS (
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
MAX(IF(event_name = 'view_item', 1, 0)) AS viewed_product,
MAX(IF(event_name = 'add_to_cart', 1, 0)) AS added_to_cart,
MAX(IF(event_name = 'begin_checkout', 1, 0)) AS began_checkout,
MAX(IF(event_name = 'purchase', 1, 0)) AS purchased
FROM `your-project.analytics_XXXXXXXXXX.events_*`
WHERE _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
GROUP BY 1, 2
)
SELECT
SUM(viewed_product) AS product_views,
SUM(added_to_cart) AS add_to_carts,
SUM(began_checkout) AS checkouts_started,
SUM(purchased) AS purchases,
ROUND(100 * SUM(added_to_cart) / NULLIF(SUM(viewed_product), 0), 1) AS atc_rate,
ROUND(100 * SUM(began_checkout) / NULLIF(SUM(added_to_cart), 0), 1) AS checkout_rate,
ROUND(100 * SUM(purchased) / NULLIF(SUM(began_checkout), 0), 1) AS purchase_rate
FROM funnel_events;
What it shows: The e-commerce funnel from product view → cart → checkout → purchase, with conversion rates at each step. This is where you find where customers drop off.
Query 6: User Cohort Retention (30-Day)
WITH first_visit AS (
SELECT
user_pseudo_id,
MIN(PARSE_DATE('%Y%m%d', event_date)) AS cohort_date
FROM `your-project.analytics_XXXXXXXXXX.events_*`
WHERE _TABLE_SUFFIX >= '20250101'
GROUP BY 1
),
activity AS (
SELECT
e.user_pseudo_id,
PARSE_DATE('%Y%m%d', e.event_date) AS activity_date,
f.cohort_date
FROM `your-project.analytics_XXXXXXXXXX.events_*` e
JOIN first_visit f USING (user_pseudo_id)
WHERE e._TABLE_SUFFIX >= '20250101'
)
SELECT
cohort_date,
DATE_DIFF(activity_date, cohort_date, DAY) AS days_since_first_visit,
COUNT(DISTINCT user_pseudo_id) AS active_users
FROM activity
WHERE DATE_DIFF(activity_date, cohort_date, DAY) IN (0, 7, 14, 30)
GROUP BY 1, 2
ORDER BY 1, 2;
What it shows: How many users from each cohort (acquisition date) returned at 7, 14, and 30 days. Low 30-day retention = content or product engagement problem.
Query 7: Revenue by Device Category
SELECT
device.category AS device_type,
device.operating_system AS os,
COUNT(DISTINCT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id')
) AS transactions,
SUM(
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')
) AS revenue,
ROUND(100 * SUM(
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')
) / SUM(SUM(
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')
)) OVER (), 1) AS revenue_pct
FROM `your-project.analytics_XXXXXXXXXX.events_*`
WHERE
event_name = 'purchase'
AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY))
GROUP BY 1, 2
ORDER BY revenue DESC;
What it shows: Revenue split by device type and OS — informs mobile vs desktop optimisation priorities.
Query 8: Landing Page Performance
WITH sessions AS (
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
MIN(event_timestamp) AS session_start,
FIRST_VALUE(
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location')
) OVER (
PARTITION BY user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')
ORDER BY event_timestamp
) AS landing_page
FROM `your-project.analytics_XXXXXXXXXX.events_*`
WHERE
event_name = 'page_view'
AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
),
purchases AS (
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value') AS revenue
FROM `your-project.analytics_XXXXXXXXXX.events_*`
WHERE
event_name = 'purchase'
AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
)
SELECT
REGEXP_REPLACE(s.landing_page, r'https?://[^/]+', '') AS landing_path,
COUNT(DISTINCT CONCAT(s.user_pseudo_id, CAST(s.session_id AS STRING))) AS sessions,
COUNTIF(p.revenue IS NOT NULL) AS conversions,
ROUND(100 * COUNTIF(p.revenue IS NOT NULL) / COUNT(DISTINCT CONCAT(s.user_pseudo_id, CAST(s.session_id AS STRING))), 2) AS conversion_rate,
ROUND(SUM(COALESCE(p.revenue, 0)), 2) AS total_revenue
FROM sessions s
LEFT JOIN purchases p ON s.user_pseudo_id = p.user_pseudo_id AND s.session_id = p.session_id
GROUP BY 1
HAVING sessions > 50
ORDER BY total_revenue DESC
LIMIT 30;
What it shows: Which landing pages generate the most revenue — critical for SEO and ad landing page decisions.
Query 9: Scroll Depth Analysis
SELECT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_url,
REGEXP_REPLACE(
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location'),
r'https?://[^/]+', ''
) AS page_path,
AVG(
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'percent_scrolled')
) AS avg_scroll_pct,
COUNTIF(
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'percent_scrolled') >= 90
) AS users_read_90pct,
COUNT(*) AS total_scroll_events
FROM `your-project.analytics_XXXXXXXXXX.events_*`
WHERE
event_name = 'scroll'
AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
GROUP BY 1, 2
HAVING total_scroll_events > 20
ORDER BY avg_scroll_pct DESC;
What it shows: How far users scroll on each page — blog posts with low average scroll depth need better hooks or shorter content.
Query 10: New vs Returning User Revenue
SELECT
CASE
WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number') = 1
THEN 'New User'
ELSE 'Returning User'
END AS user_type,
COUNT(*) AS purchases,
SUM(
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')
) AS revenue,
AVG(
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value')
) AS avg_order_value,
COUNT(DISTINCT user_pseudo_id) AS unique_buyers
FROM `your-project.analytics_XXXXXXXXXX.events_*`
WHERE
event_name = 'purchase'
AND _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY))
GROUP BY 1;
What it shows: Revenue split between new and returning buyers — if returning users have dramatically higher AOV, that informs your retention marketing investment.
Optimising BigQuery Costs
BigQuery charges based on data scanned per query. A few practices keep costs near zero:
1. Always use _TABLE_SUFFIX filters with wildcard tables:
-- ❌ Scans ALL data (expensive)
FROM `project.analytics_XXXXXXXXXX.events_*`
-- ✅ Scans only last 30 days
FROM `project.analytics_XXXXXXXXXX.events_*`
WHERE _TABLE_SUFFIX >= FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
2. Select only the columns you need — BigQuery charges per column scanned.
3. Use partitioned tables — GA4 tables are already date-partitioned by _TABLE_SUFFIX.
4. Cache query results — BigQuery caches results for 24 hours. Rerunning the same query is free.
5. Monitor your usage: Cloud Console → BigQuery → Monitoring → view bytes processed per query.
Connecting BigQuery to BI Tools
Once your data is in BigQuery, you can connect it to visualisation tools:
Looker Studio (free):
- Looker Studio → Create → Data source → BigQuery → select your table
- Build dashboards without writing SQL — drag and drop from raw event data
Tableau / Power BI:
- Both have native BigQuery connectors
- Connect via service account credentials from Google Cloud IAM
dbt (for data teams):
- Use dbt to build materialised views and transform raw GA4 events into clean reporting tables
- The GA4 dbt package from Fivetran is a popular starting point
Common BigQuery Export Mistakes
Mistake 1: Querying without date filters
Running SELECT * on events_* without a _TABLE_SUFFIX filter scans your entire history. On a large property, this can cost hundreds of dollars. Always filter dates.
Mistake 2: Forgetting UNNEST for nested arrays
GA4’s event_params is an array. You must use UNNEST() or (SELECT ... FROM UNNEST(event_params) WHERE key = '...') to extract individual values. Directly referencing event_params.key returns an error.
Mistake 3: Mixing INT and FLOAT event_params
GA4 stores parameter values in typed union fields: string_value, int_value, float_value, double_value. The wrong type returns NULL. Check which type a parameter uses with:
SELECT
key,
value.string_value,
value.int_value,
value.float_value,
value.double_value
FROM `project.analytics_XXXXXXXXXX.events_20250101`,
UNNEST(event_params)
LIMIT 100;
Mistake 4: Counting sessions incorrectly
There’s no native “session ID” column — sessions are identified by the combination of user_pseudo_id + ga_session_id (extracted from event_params). Always use CONCAT(user_pseudo_id, CAST(session_id AS STRING)) for distinct session counts.
When to Enable BigQuery Export
You should connect GA4 to BigQuery if any of these apply:
- You run Google Ads and want unsampled data for attribution analysis
- You need more than 14 months of historical data
- You want to join GA4 data with Shopify, Amazon, or CRM data
- You build custom dashboards in Looker Studio, Tableau, or Power BI
- You’re an e-commerce store where revenue accuracy matters
For a GA4 property with under 100,000 events/day, BigQuery export costs will typically be $0–5/month — well within the free tier for queries if you use date filters consistently.
Need help setting up your GA4 BigQuery export or building custom dashboards from your raw data? We set up GA4 → BigQuery pipelines and build Looker Studio dashboards for e-commerce stores and SaaS companies. Book a free consultation.