I am helping a friend build his Seinfeld trivia game. It's working and the Seinfeld-zealots on reddit seem to like it a lot, but we need to advertise it. The issue of tracking our traffic sources has arisen. We need to generate traffic and want to know where our traffic is coming from so we can effectively assess where the tiny marketing budget would be be spent. We will be installing Google Analytics to track user behavior, etc.

NOTE: the project runs Node+MongoDB server side (not my choice as I can help very little there) and the ability to implement logic server-side will be quite limited.

I've seen widespread use of query strings in links by/to the giant websites (e.g., fb, twitter, google, etc.). E.g., a recent search for "krakatoa" on google yields a twitter result with appended query string elements clearly intended to identify the traffic source:

q=krakatoa&ref_src=twsrc^google|twcamp^serp|twgr^search

Facebook also appends ?fbclid=<SOME-LONG-STRING> to any url that you post there, where <SOME-LONG-STRING> is either cryptic base64-encoded data or, more likely, some uniqid for tracking purposes.

Clearly they are tracking user behavior using query strings rather than hashtags and this got me wondering how they reconcile this with SEO imperatives. I learned from past experience that having many different urls linking to a single page of content (e.g., query strings that specify sorting or something) can really badly penalize your search engine rankings. Two obvious possible reasons come to mind: 1) folks sharing links to your site are not using the same url, which spreads your page rank score across many different urls and 2) Search engines may be penalizing you for spamming the content, etc.

I do see that one can specify rel=canonical attributes in one's own links, but this doesn't help for traffic from external sources. A 301 redirect also seems bad because you could surely redirect to your canonical url but you wouldn't be able to load any Google Analytics first, which would defeat your use of the query strings for tracking. A header sounds good, as does a sitemap.

So my question is a rather broad one, and my knowledge of marketing & tracking techniques is woefully inadequate. What are the cool folks doing to track sources of traffic to better steer marketing dollars? Any tips specific to Google Analytics would be much appreciated, as that will likely be our primary analytics tool.

sneakyimp I do see that one can specify rel=canonical attributes in one's own links, but this doesn't help for traffic from external sources.

Generally speaking, search engines following links will read the content of the pages the links point to, and key that page content against the canonical URL found on that page, rather than to the specific URL they followed to get to that page. They are of course free to ignore it (if they don't mind their engine being less useful), and probably will if they determine the relationship between the original and canonical requests fails to satisfy the conditions of RFC6596, as for example in the case of a search results page.

Weedpacket

Thanks for your response! The RFC has very specific detail. I think I'm starting to grok this.

the canonical URL found on that page...

This is the crux of my question -- how to tell any visiting search engine what the canonical page is. The google link I shared above suggests several ways to specify canonical url:

  • <link rel="canonical" href="https://example.com/my-canonical-url" /> in the HTML document header.
  • send HTTP header like so: header('Link <https://example.com/my-canonical-url>');
  • list my canonical urls in a sitemap without any unwanted query strings
  • 301 redirect--use this only when deprecating a duplicate page.

The RFC provides some helpful tips:

  • a document can identify itself as canonical url
  • if your content gets sliced and diced by query strings or paging, try to specify a canonical url with the full content or the indexing will be incomplete
  • canonical url can exist on a different domain
  • Avoid designating a canonical to any url returning 300 and 301 response codes, 4xx error codes, or any document that declares a canonical url other than itself
  • Avoid specifying any partial results as the canonical url. E.g., don't specify page-1.html as the canonical url for page-2.html because these pages contain different content which would probably be ignored/not indexed.

I'm still wondering about whether some 3xx redirects might be helpful or necessary. The RFC is a bit confusing about them, saying:

The target (canonical) IRI MAY be the source IRI of a temporary redirect. For HTTP, this refers
to status codes 302, 303, or 307

I'm not really sure what it means by "IRI may be the source" -- I guess that means the canonical url can redirect to yet another url as long as the redirect is temporary?

I'm also wondering about redirects and their relation to google analytics. I see in my google analytics a metric ton of AJAX calls and query-sting-polluted urls, especially with the fbclid that facebook appends to any links posted on their site. It is unclear to me if this fbclid gets used by google analytics or not, and I rather dislike the idea of people copying and pasting such a url all over the place. It feels a bit of privacy invasion by Facebook, and it's not clear to me that my site or brand will benefit. On the other hand, when I see a specific fbclid in google analytics, that seems a pretty clear indication of the reach of a particular link.

I'm thinking that the query strings might be useful in google analytics (except for maybe all those AJAX requests) and even necessary for analytics to properly track sources. As long as I can get search engines to ignore them and refer only to the pristine, query-string-free main url -- and aggregate all my yet-to-be-accumulated PageRank points there -- then that would be satisfactory.

Any anecdotes or wisdom would be much appreciated.

sneakyimp I guess that means the canonical url can redirect to yet another url as long as the redirect is temporary?

Yes, since a temporary redirect means clients ought to keep using the original URL as that's the stable one, and the one it's being redirected to is, well, temporary for whatever reason. A 301 Permanent Redirect is the opposite: it tells the client that the URL it's being redirected to is the stable one and the URL it originally asked for is deprecated (time to update your bookmarks!). RFC7231, §6.4. especially pp56-58.

The implication is that a canonical URL shouldn't be a permanent redirect: if the "canonical" URL is deprecated, why is it still being published as the canonical URL?

sneakyimp As long as I can get search engines to ignore them and refer only to the pristine, query-string-free main url -- and aggregate all my yet-to-be-accumulated PageRank points there -- then that would be satisfactory.

I kind of suspect that that is what the search engines do: whatever the link actually says, it gets credited as being a link to the canonical page for purposes of ranking that page (for that to work reliably the search engine would of course have to read the content of both pages to see if the canonicalisation is legitimate as per RFC). I guess the canonical URL would be used for search results if it's provided, and the detailed URL for analytics (because it's more important to you to know exactly how others are linking to your pages).

3 months later

Basically, it will tell you how many of your visitors are coming to your site via organic search, referrals, through direct visits (by typing your URL into the browser), or through social media. This means that if the majority of your visitors are coming through organic search, it means you likely have an effective SEO strategy in place; however, your social media marketing strategy may need some work if the number of visitors coming from social channels is low.

It is beneficial to have a well-balanced number of visitors from each source. A well-balanced content strategy that utilizes effective messaging and call-to-action is going to encourage engagement from all of your sources. It is important to pay attention and notice if you see a sharp decline or stagnation in growth. This might be a sign that your strategy needs some tweaking.

    Write a Reply...