This weekend, in conjunction with the Counsilman Center of Swimming Performance (a group that falls under the auspices of the department of public health, rather than engineering or statistical measurement) has issued several statements through the website swimmingworldmagazine.com calling into question what they believe to be a “flaw” or a “current” in the pool at the World Championships.
In their opinion, the fact that 17 out of 24 medalists in the 50 meter races came from lanes 5-through-8 is a statistical indication of some sort of devious back-draft in the pool, and have even gone so far as to call upon FINA to consider changing results.
While we await the Counsilman Center’s full report, we sit and pray that they find better information than that which they’ve provided. While we don’t have the backing of Indiana University here at SwimSwam, we have taken plenty of statistics classes, certainly enough to conquer this basic math, and enough to see even bigger flaws in the accusations than the supposed ‘flaws’ in the pool.
Let’s look at the claims, and discuss why they’re, in a word, terrible:
1. Thus, over the long haul, there should be more medals in lanes 1-4 than lanes 5 -8. But, in the finals of all of the 50s, if you look at the number of medals won by lanes 1-4 and 5-8 in the 4 individual 50 events, lanes 1-4 won only 7 medals and 5-8 won 17 medals. That’s a really big difference, especially because based on seed times as stated, you’d actually always expect more medals in lanes 1-4.
Point one: this is nowhere near ‘over the longhaul’. The researchers are taking a sample size of 24: 8 50’s x 3 medals. Their theory is that 50’s are all that is relevant, because swimmers are only going in one direction (in other words, either with or against a current). That 24 in-and-of-itself is a tiny sample size, but even the proposal of a sample size of 24 is inflated. The sample size here is 8. 8 races. While this may be enough to notice a funny trend, remember that 3 medals per race is a human-initiated arbitrary award.
2. Were the results unexpected?
Out of those same 24 medals, 18 were won by the top four seeds coming out of the semifinals (specifically, lanes 3, 4, 5, and 6). Just because more of those were lane 6 and lane 5, are we to declare this a bias in the results? Lane 4 medaled in 6 of the 8 races, as one would expect in a final, so to even call the bias a “lane 1-4 bias” would be wrong.
3. Seed times are being used as a control.
The whole ‘flaw’ is based on the researcher’s expectation that seed times in the final should dictate the ‘expectations’ of placement in the final. To some extent, this is true. As noted above, the middle lanes, who were the fastest in the semi-finals, are expected to win most of the medals. But we’re really splitting hairs to say that lane 4 is a favorite over lane 5, lane 3 over lane 6, lane 2 over lane 7, and lane 1 over lane 8. Consider that those swimmers weren’t always in the same semi-finals, which means that they weren’t racing head-to-head under identical circumstances to earn those seeds.
Then consider how small of a separation in seeds we’re getting for these swims. In the men’s 50 back, lane 4 had a .06 second seed advantage over lane 5; same with 3 over 6; lane 2 over 7 had a .11 second seed advantage, and lane 1 and lane 8 had identical seed advantages. So now, we’re using the tightest-packed and most unpredictable races, the 50’s, to make sweeping trends about currents? I think not. We already know that seed times coming into a meet have little impact on actual outcome of the races, thanks to our pick ’em contest results.
4. “When you go back and do the same thing for the World Championships for 2009, lanes 1-4 won 15 medals and lanes 5-8 won 9. This is as it should be. When you do that for 2011, lanes 1-4 won 13 medals and lanes 5-8 won 11. Both of these events were run permanent pools, not temporary pools.”
Yeah, so now your sample size is two, even smaller than the 8. Look at the semi-finals. In the women’s 50 backstroke, the top four seeds from each semi-final made it to the final. Exactly as expected. In fact, 29 out of 48 swimmers who made the finals swam semi-finals were in lanes 1-4: which based on the methodology being used where seed times are the control is much more than the expectation (which should be half-and-half).
And, if you want to take the really obvious route, we’ll go back to 2007. There, 11 out of 24 medalists came from lanes 5-8. In 2005, 8 out of 24 medalists were from lanes 5-8. In 2003, it was 10 out of 24. In 2001, the first year of 50’s it was 12 out of 24.
So there, in a sample size of 136 races, by far the biggest of any proposed yet in this ‘whodunnit,’ any statistician worth his pay could come to a conclusion as to how 2013 fits into the trend: he would find that with this sample size, there’s no trend whatsoever. In list form:
Medals won by lanes 5-8:
See a trend there? Neither do we.
That’s because of the ‘human factor’. Statistics are applied best when done to random samples. Swimming races are not random samples. To find any sort of usable trend in a non-random event means an even huger sample size is needed to gain any credibility.
And lest we be too elementary, we’ll point out the ‘correlation-causation’ clause to anything.
Simply put, this is not something that can be declared with statistics. It is a test case for engineering. Of course, many times statistics help engineers know where to start looking, so there’s no problem with looking at statistics. But when statistics, especially such flawed ones, are concluded with a call to change the results of a World Championship, that’s when it becomes a problem.
If FINA is curious, it would become a relatively easy problem to test, and even if there was a small current found, these numbers wouldn’t prove it had any affect on medals, but until an engineer becomes involved, this is nothing more than a curiosity.