During the early part of last weekend’s U.S. National Championships, it seemed like many of the pre meet favorites were all hype. Justin Wright and Zach Harting went 1-2 in the men’s 200 fly ahead of Jack Conger. Blake Pieroni won the men’s 100 free, not all-everything star Caeleb Dressel. Andrew Seliskar won the men’s 200 free over favored Townley Haas.
On the other hand, there were plenty of events where the top seeds dominated. Katie Ledecky, Leah Smith, Ryan Murphy, Simone Manuel placed about where everyone one expected them to. So which happened? Was this a meet full of surprises or a predicable coronation of favorites?
To answer that we need a metric for expectations, so we can see how the results stack up against expectations and a baseline to compare to this year’s meet’s data. This year’s pick’em contest entries work as a measure of expectations. The pick’em contest asked readers to pick the top 4 places in every event. This can give us a good idea of what the general swim fan expected to happen at this meet.
As a baseline I looked at the pick’em entries from last year’s U.S. Summer Nationals.
There was a bit of clean up required before comparing the two data sets. First I removed any entries that didn’t submit a pick for every event. This left 422 entries from last year and 775 from this year. Next I adjusted the scoring rules of last year’s pick’em to match this year. This year’s scoring system was 7 points for correctly picking a swimmer 1st, 5 for 2nd, 4 for 3rd, and 3 for 4th. 1 point was awarded for picking a top 4 finisher in the top 4, but in the incorrect position.
The primary differences in last year’s scoring system were: 1. the single point for right swimmer/wrong spot in the top 4 wasn’t awarded and 2. the 100 and 200 free were scored to 6th. The points for each of the top 4 positions were the same. Given how similar the scoring systems were, people’s picking strategies should be pretty similar both years.
The median score in last year’s contest scored with this year’s rules was 269 (mean 267, standard deviation 26). The median score in this year’s contest was 273 (mean 270, sd 24). Those totals are remarkably similar. That would indicate that this year’s meet wasn’t abnormally predictable or unpredictable. The field as a whole performed almost exactly the same vs expectations both years.
The first and last day were the least predictable this year which is perhaps why it seemed that this meet was so unpredictable.
It’s possible that both this year and last year’s meets were outliers when compared to more years of data, but these two years were all the readily available and reasonably comparable pick’em data I had.
|Total||Day 1||Day 2||Day 3||Day 4||Day 5|
How Accurate are People’s Expectations
Given that we’re already digging around in the pick’em stats, let’s go a bit deeper. How good of a prediction are people’s picks?
Unsurprisingly, people pick favorites more often than the favorites come through. Over picking favorites is rational given the scoring rules. The rules don’t offer odds. There aren’t extra points for getting that one obscure pick correct. If, for example, everyone agrees Katie Ledecky has a 98% chance to win the 800 free, 100% of people should pick her to win.
Of the 33 swimmers picked by 90% or more of pick’em entries to finish in a particular spot, 82% actually did, less than picked as expected. Swimmers favored by 80%-90% of picks finished there 71% of the time. 70%-80% was 54% correct. Swimmers under performed their picked percentage all the way down to 10% picked. Swimmer’s picked less than 10% of the time over performed. Swimmer’s picked between 0% and 1% of the time finished in their picked positions 2% of the time. 22 swimmers placed in the top 4 after being picked by no one (This was out of a total of 19,147 swimmers not picked in the top 4 so the conversion rate was pretty low).
In general, a swimmer’s odds of actually finishing in a particular place track pretty well with how often they were picked to finish there. Things get a bit wonky in the 30%-60% range but that could be a sample size issue. This is only 2 years of data. For the most part, swimmers picked more often in a particular place actually finished there more often.
Combined 2017 and 2018 Data
|How Often a Swimmer was Picked||How Many Swimmers Were Picked That Often||How Often The Pick was Right|