If direct traffic is your second-largest channel in GA4, it’s almost certainly hiding misattributed sessions from email, AI chatbots, app deep links, and broken UTMs. Most of what GA4 calls “direct” isn’t direct at all, and the gap between what you see and what actually happened is wider than most analytics managers want to admit.
I’ve audited direct traffic on roughly forty GA4 properties over the last two years. The pattern is consistent: between 35% and 60% of (direct)/(none) sessions had a knowable source within the prior 30 days. That source was lost somewhere along the request chain. The bigger the brand, the worse the leak, because bigger brands have more email volume, more paid campaigns with sloppy redirects, more SSO walls, and more inbound from AI assistants.
This post is the playbook I actually use when a client asks why their direct traffic doubled. No theory, no “consider implementing.” Just the nine causes, the BigQuery to quantify them, the GTM fix to stop the bleed, and the workflow to reclassify what you already lost.
The 9 real causes of (direct)/(none)
Before you fix anything, you need a working mental model of how sessions land in the Direct bucket. GA4 assigns (direct)/(none) when a session_start event arrives with no UTM parameters, no gclid/gbraid/wbraid/dclid, no document.referrer, and no recent campaign in the user’s session state. Any one of these can be stripped.
Here are the nine causes I see in the wild, ranked by how much damage they typically do:
| # | Cause | Typical impact on Direct % | Where it shows up |
|---|---|---|---|
| 1 | Untagged email, SMS, push, in-app notifications | 15–30% | Spikes correlated with send times |
| 2 | AI assistant referrer stripping (ChatGPT, Claude, Perplexity, Gemini) | 5–15% and growing | Sessions with no referrer hitting deep content URLs |
| 3 | HTTPS → HTTP downgrade | 2–8% on legacy sites | Referrer drops when a secure page links to an insecure one |
| 4 | Meta referrer-policy set to no-referrer or same-origin | Varies, can be huge | Audit the <meta> tag and HTTP headers |
| 5 | Redirect chains that drop query strings | 5–20% on paid traffic | Click trackers, vanity domains, shorteners |
| 6 | App-to-web transitions (iOS/Android in-app browsers) | 5–10% on mobile-heavy sites | Sessions from instagram.com, facebook.com showing as direct |
| 7 | Single-page app pushState bugs that fire page_view without preserving campaign params | 3–10% | Internal navigation overwriting session source |
| 8 | Overzealous referral exclusion list | 2–7% | Payment providers, SSO domains added “to be safe” |
| 9 | Login walls and intermediate auth flows | 2–5% | Sessions reset after Okta/Auth0 round-trips |
Most teams chase #1 and stop there. That leaves at least half the leak unaddressed.
Untagged email, SMS, and lifecycle messaging
The obvious one. Every ESP I’ve worked with — Klaviyo, Iterable, Braze, HubSpot, Mailchimp — has a UTM auto-tagging setting buried somewhere, and it’s frequently off, partially configured, or applied inconsistently across campaign types. Transactional emails are the worst offender because marketing rarely owns them.
Quick test: pull the last 30 days of (direct)/(none) traffic, group by hour-of-day and day-of-week, and overlay your email send schedule. If the correlation is obvious to the naked eye, you have an email tagging problem.
AI assistant referrer stripping
This one is newer and growing fast. When ChatGPT, Claude, or Perplexity cite your page, the click that lands on your site often arrives with no referrer header, or with a referrer from a domain that doesn’t carry campaign context. Some assistants pass chat.openai.com or perplexity.ai as a referrer; others pass nothing. The result: a fast-growing slice of high-intent traffic dumped into Direct.
Identifying these sessions requires looking at landing page patterns. AI-sourced traffic tends to hit deep informational URLs (specific blog posts, documentation pages, comparison content) rather than the homepage. If your direct traffic to /blog/* URLs is growing faster than direct traffic to /, you’re seeing AI assistant leakage. We’ve started recommending clients append a ?ref=ai parameter to URLs they expose in llms.txt or structured data feeds, and instrument a custom channel group for it.
HTTPS → HTTP downgrades and meta referrer policies
If a secure page links to an insecure one, browsers strip the referrer by default. Less common now, but still relevant for clients with old subdomains or partner integrations on HTTP.
More common: a developer set <meta name="referrer" content="no-referrer"> or same-origin site-wide, often because someone read a security blog post about leaking auth tokens in URLs. The fix is to set it to strict-origin-when-cross-origin (which is also the modern browser default). This passes the origin to other sites without leaking the full path, so partners can still attribute clicks to you.
<!-- Correct for most marketing sites -->
<meta name="referrer" content="strict-origin-when-cross-origin">
Check both the meta tag and the Referrer-Policy HTTP header. If they conflict, the header wins.
Redirect chains that drop UTMs
This is the one that quietly destroys paid attribution. A campaign URL goes through a click tracker, then a vanity domain, then a 301 to the canonical product page. Somewhere in that chain, the query string gets dropped — usually because a developer wrote a redirect rule that doesn’t preserve query parameters.
Test it manually. Take a tagged URL from your last campaign, paste it into a redirect tracer (or just curl -IL), and watch what happens to the ?utm_* params at each hop. If they disappear at any step, you’ve found a leak.
curl -sIL "https://go.yourbrand.com/promo?utm_source=newsletter&utm_medium=email&utm_campaign=spring" \
| grep -iE "^(location|HTTP)"
Quantifying the leak with BigQuery
You can’t prioritize what you can’t measure. If you have GA4’s BigQuery export enabled (and you should — it’s free up to 1M events/day on the standard tier), this query estimates what percentage of your direct sessions had a known source within the last 30 days.
The logic: for every session that landed as (direct)/(none), look back 30 days at the same user_pseudo_id and check whether any earlier session had a real source. If yes, that direct session is “probably misattributed.”
WITH sessions AS (
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
TIMESTAMP_MICROS(event_timestamp) AS session_start_ts,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') AS source,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') AS medium,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS landing_page
FROM `your-project.analytics_XXXXXXXX.events_*`
WHERE _TABLE_SUFFIX BETWEEN
FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 60 DAY))
AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
AND event_name = 'session_start'
),
direct_sessions AS (
SELECT *
FROM sessions
WHERE (source = '(direct)' OR source IS NULL)
AND (medium = '(none)' OR medium IS NULL)
AND session_start_ts >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
),
prior_known_source AS (
SELECT
d.user_pseudo_id,
d.session_id,
d.landing_page,
MAX(CASE
WHEN s.source IS NOT NULL AND s.source != '(direct)'
THEN 1 ELSE 0
END) AS had_prior_source
FROM direct_sessions d
LEFT JOIN sessions s
ON s.user_pseudo_id = d.user_pseudo_id
AND s.session_start_ts < d.session_start_ts
AND s.session_start_ts >= TIMESTAMP_SUB(d.session_start_ts, INTERVAL 30 DAY)
GROUP BY 1, 2, 3
)
SELECT
COUNT(*) AS direct_sessions,
SUM(had_prior_source) AS likely_misattributed,
ROUND(100 * SUM(had_prior_source) / COUNT(*), 1) AS pct_misattributed
FROM prior_known_source;
This is a lower bound, not an upper bound. It only catches users who had a prior known session — first-time visitors with referrer stripping won’t show up. In practice, if this query returns 40%, your real leak is closer to 55–65%.
For deeper diagnosis, group by landing page or hour-of-day to find patterns. A spike of direct sessions to /checkout/success every Tuesday at 9am is your weekly newsletter going out untagged.
GTM recipe: preserve UTMs across redirects and login walls
Here’s the pattern I use to keep campaign parameters alive across SSO flows, paywalls, and any intermediate redirects that strip the query string.
The idea: on first page load, if the URL has UTM parameters, stash them in sessionStorage. On every subsequent page load (including after redirects), check sessionStorage and write the params back into the GA4 event payload if the current page lacks them.
In GTM, create a Custom JavaScript variable called cjs.persistedCampaign:
function() {
var KEYS = ['utm_source','utm_medium','utm_campaign','utm_term','utm_content','gclid','gbraid','wbraid','msclkid'];
var url = new URL(window.location.href);
var hasAny = KEYS.some(function(k){ return url.searchParams.has(k); });
// If current URL has UTMs, persist them
if (hasAny) {
var payload = {};
KEYS.forEach(function(k){
var v = url.searchParams.get(k);
if (v) payload[k] = v;
});
payload._ts = Date.now();
try {
window.sessionStorage.setItem('aum_campaign', JSON.stringify(payload));
} catch(e) {}
return payload;
}
// Otherwise, try to restore from sessionStorage (within 30 min window)
try {
var stored = window.sessionStorage.getItem('aum_campaign');
if (!stored) return undefined;
var parsed = JSON.parse(stored);
if (Date.now() - parsed._ts > 30 * 60 * 1000) return undefined;
return parsed;
} catch(e) {
return undefined;
}
}
Then in your GA4 Configuration tag (or every GA4 Event tag, depending on your setup), add these as event parameters:
campaign_source={{cjs.persistedCampaign}}.utm_source(using a second helper variable per key)campaign_medium={{cjs.persistedCampaign}}.utm_medium- …and so on
Or pass them as a single JSON blob and split server-side. Either works.
Why sessionStorage and not localStorage: you don’t want a UTM from three weeks ago hijacking a genuinely new session. The 30-minute timestamp check above mirrors GA4’s default session timeout.
This approach breaks when: the redirect crosses domains and you don’t have cross-domain measurement configured, or when the browser opens the destination in a new tab without inheriting sessionStorage. For cross-domain flows, you also need to manually append the params to outbound links in your sGTM or in a Click trigger. If you’re doing this kind of work at scale, our GTM service has the patterns prebuilt.
Audit your referral exclusion list
Every GA4 property I’ve audited had at least one wrongly-excluded domain in the referral exclusion list (technically the “list unwanted referrals” config under Data Streams → Configure tag settings).
The rule is simple: only exclude domains that legitimately bounce a user back to your site mid-session (payment processors, SSO providers, your own subdomains). Excluding anything else hides real traffic.
Bad exclusions I’ve seen:
| Excluded Domain | Why It Was Wrong |
|---|---|
mail.google.com | Treats Gmail webmail clicks as direct instead of referral |
t.co | Hides Twitter/X traffic that wasn’t UTM-tagged |
linkedin.com | Hides organic LinkedIn referrals |
bing.com | Someone confused referral exclusion with channel grouping |
*.yourbrand.com (wildcard) | Hides legitimate subdomain referrals that should be tracked |
What you actually want excluded:
- Your payment processor’s hosted checkout (
checkout.stripe.com,paypal.comif redirected) - Your SSO provider (
accounts.google.comif you use Google SSO,login.microsoftonline.com) - Auth0/Okta tenants where users round-trip back to your domain
- Your own root domain and any subdomains you’ve set up for cross-domain measurement
Pull your current list. For every entry, ask: “does a logged-in or paying user pass through this domain and come back to mine?” If no, remove it.
Setting referrer-policy correctly
Your referrer policy is a two-way street. If you strip referrers on outbound, your partners can’t attribute traffic to you and won’t prioritize you in their reports. If your partners strip referrers, you lose the data.
Recommended setting for marketing sites:
<meta name="referrer" content="strict-origin-when-cross-origin">
Or via HTTP header:
Referrer-Policy: strict-origin-when-cross-origin
This sends the full URL to same-origin requests, just the origin to cross-origin requests over HTTPS, and nothing on HTTPS→HTTP downgrades. It’s the modern browser default, but many sites override it with stricter values without realizing the attribution cost.
Test your current policy with the browser dev tools Network tab. Click an outbound link and check the Referer header on the destination request. If it’s missing or just an origin when you expected a full path, you know what to fix.
Reclassifying historical direct traffic in BigQuery
You can’t change what GA4 already recorded, but you can build a corrected view downstream. The approach: for every direct session, look back N days for the most recent known source and reassign.
WITH all_sessions AS (
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
TIMESTAMP_MICROS(event_timestamp) AS session_ts,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') AS source,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') AS medium,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'campaign') AS campaign
FROM `your-project.analytics_XXXXXXXX.events_*`
WHERE event_name = 'session_start'
),
reclassified AS (
SELECT
a.user_pseudo_id,
a.session_id,
a.session_ts,
a.source AS original_source,
a.medium AS original_medium,
COALESCE(
a.source,
(SELECT b.source FROM all_sessions b
WHERE b.user_pseudo_id = a.user_pseudo_id
AND b.session_ts < a.session_ts
AND b.source IS NOT NULL
AND b.source != '(direct)'
ORDER BY b.session_ts DESC LIMIT 1)
) AS reclassified_source,
COALESCE(
a.medium,
(SELECT b.medium FROM all_sessions b
WHERE b.user_pseudo_id = a.user_pseudo_id
AND b.session_ts < a.session_ts
AND b.medium IS NOT NULL
AND b.medium != '(none)'
ORDER BY b.session_ts DESC LIMIT 1)
) AS reclassified_medium
FROM all_sessions a
)
SELECT * FROM reclassified
WHERE original_source = '(direct)' OR original_source IS NULL;
Use this view in Looker Studio alongside (not instead of) GA4’s native
Share this article