Imagine for a second, that there are only 5 sites on the Internet, and only 100 total searches per month.
- Site 1 gets 40 of the visits, or 40% of the traffic
- Site 2 gets 30 of the visits
- Site 3 gets 15 of the visits
- Site 4 gets 10 of the visits
- Site 5 gets 5 of the visits
Now, imagine Google rolls out a crushing update that utterly decimates Site 2. Site 2 now gets 0 visits, but all of the other sites just move up to take Site 2’s lost positions. All else being equal, the ratio of those 30 visitors will not be divided equally, but instead will be divided according to the current rankings balance. Of the 30 visitors, 4/7 (the ratio of Site #1 to Site #3,#4 and #5) will go to Site #1. Roughly 17 of those visitors will end up on Site #1, more than half. Site #3 will get 6 more visitors. In a normal winners-losers list, it sure does look like Site #1 was a big winner! But the truth is, there was 1 loser and everyone else improved proportional to Site #2’s losses.
You might be thinking, we could just use % instead! If we know the ratios, we could determine winners and losers by just comparing their increase/decrease ratios. You are on the right path, but unfortunately, there is more to it. Big rank tracking tools like SEMRush, SearchMetrics, SpyFu, Algoroo, MozCast, and SERPScape are all taking samples of search traffic. We don’t know all the keywords out there (nor could we), so we have to expect some random noise that occurs from the keywords we select to track vs. what is really going on. So, imagine instead that we have the same 5 sites, the proportions are the same, but there are 1000 visitors every month. However, we only track 100 of them. So what happens to confuse the ratios?
If we were to randomly select those keyword rankings we would expect a slight deviation every time. We wouldn’t expect to get the exact same answer. In fact, in redistributing those 300 visits lost by Site B, we would expect a standard deviation +- 8.6 for Site #1, 7.1 for Site #3, 6.0 for Site #4, and 4.5 for Site #5. Now, that might not seem very high, and I would agree with you, but something interesting happens with ratios.
Let’s say that randomly Site #1 gets 8 more visits than expected, and Site #3 gets 7 more visits than expected. Both of those numbers are within the standard deviation, so they would not be out of the ordinary. But 8 visits to Site #1 only accounts for 8/160 expected new visitors. While Site #3 gets 7 out of an expected 60 visits. While both are completely expected deviations, it looks like Site #3 enjoyed an 11% increase above expectations while Site #1 only got 5% over expectations. This is because when counting things in a fixed area, deviations tend to follow what is known as a Poisson Distribution, where the expected deviation is close to the square root of the expected count. This causes the deviations to vary less relative to one another than the actual counts. So, when using ratios to determine winners and losers, we have to expect that there will be greater variance among sites with a smaller share of traffic.
As search marketers, we now have some incredible tools at our fingertips. We can do sophisticated machine learning, we have huge rankings data sets, we have everything we need – except the discipline to do it right. Getting the winners-losers list right is the first step in building good training sets to determine, statistically, what factors might have influenced a rankings update. Let’s get this part right.