SwimSwam’s Swimulator is intended to be an open-source tool that allows swim fans across the world to draw new and interesting conclusions about what is happening and what could be happening in our sport. Our in-house experts, Andrew Mering, Kevin Hallman, and Barry Revzin, have been parsing it for the deepest nuggets of swimming wisdom, and now we invite you, the readers, to contribute that which you find. Find something that you want to share with our audience? Email it to [email protected] and we’ll consider it for inclusion online or, if it’s really good, in the next issue of SwimSwamMag.
If you’re a fan of NCAA swimming, you know this championship season had some jaw-dropping, head-turning, gasp-worthy swims. The final session of the men’s Division I meet was no exception. The night began with a thrilling nail-biter of a mile that served as the first chapter of a new edition of the record books. The previous NCAA, U.S. Open, and American records in the mile, 100 Free, 200 Breast, and 200 Fly all fell, and fell pretty hard. Witnessing so much history in one session was a lot to take in. Not only was each event packed with some of the fastest swimmers in the world, but each event had a backstory. There was a noticeable emotion tagging along with the athletes, setting an even more dramatic stage during an already high-stakes finals session.
First was the mile, where we saw an injury-stricken Clark Smith track down the field in the last 250 yards to out-touch three other world-class milers, all breaking Connor Jager’s previous mark. After scratching the 200 Freestyle due to a groin injury (which happened while breaking another unthinkable record in the 500 Freestyle two nights earlier), Smith shattered the previous mark in the mile by 1.11 seconds. We then moved to the 200 Back where Ryan Murphy tracked down a blazing front-half from John Shebat to win his second four-peat. Murphy joined John Naber as the only man to win four consecutive titles in both backstroke events. It then came time for the 100 Freestyle, possibly the most-anticipated event of the night. The race was a thrill, all 40.00 seconds of it. Coming within 0.01 seconds of an unimaginable barrier in swimming, Caeleb Dressel once again redefined speed in the world of sprinting.
The favorites to win the next two events each had something to prove. After coming within 0.14 seconds of making the Olympic team this past summer, Will Licon returned to Texas for his senior year. Seemingly motivated rather than discouraged by this, Licon blasted the fastest time in history with a 1:47.91 to win his third consecutive 200 Breast title, and his third title of the meet. The final individual event of the meet, the 200 Butterfly, featured top seed Jack Conger. Conger had come within inches of claiming an individual NCAA title in years past, seemingly always just out-touched by his Texas teammate, Joseph Schooling. With Schooling not in the final of the 200 Fly this year, many felt it was Conger’s time. Without hesitation, Conger jumped out to an early lead and never trailed in his record-setting 1:37.35, erasing Schooling’s previous record of 1:37.97.
Each of these swims were impressive in their own right, and rightfully deserve their place in NCAA swimming history. But which, if any, was the most impressive? Is it even possible to decide? Coming from a background in mathematics and analysis, I turned to data to investigate this question. The first question I asked myself was whether or not it was possible to quantify impressiveness. The “impressiveness” of something is often a matter of opinion, as different things are more or less impressive to different people. But would it be possible to somehow define impressiveness using data?
I felt that a distinction needed to be made between the impressiveness of a race and that of a swim. How can you quantify the impressiveness of the race that took place between Clark Smith, Felix Auboeck, Akaram Mahmoud, and Jordan Wilimovsky, a thrilling battle to the very end? How can you put data behind Clark Smith’s willpower to overcome injury for his team? Or the ability of Ryan Murphy to track down Shebat in the last 50 of the 200 backstroke to defend his title? The answer I came up with was: you can’t. I thought it would be inaccurate to attempt to quantitatively compare different races. But using data to compare two different swims, now that’s a different ballgame. By just focusing on the swims that the individuals in question had, I used a few metrics to analytically single out the most “impressive” swim of the night.
I began my analysis by looking at the four individual events in which the previous record was broken: the mile, 200 Breast, 200 Fly, and 100 Free. Note that when I say record I simply mean the previous fastest recorded SCY time. I first looked at the margin of victory over the second-place finishers as well as the margin by which the previous record was broken. Of course, because these events are different lengths and take different amounts of time, to make things fair I considered the percentages of each of these margins. Table 1, below, contains this initial analysis.
Table 1. Percent impressiveness of winning times over second-place and previous record. | |||||
Event | Winning Time | 2nd Place Time | Previous Record | % of Victory | % Broken Record |
1650 Free | 14:22.41 | 14:22.88 | 14:23.52 | 0.05% | 0.13% |
100 Free | :40.00 | :40.95 | :40.46 | 2.32% | 1.14% |
200 Breast | 1:47.91 | 1:51.22 | 1:48.12 | 2.98% | 0.19% |
200 Fly | 1:37.35 | 1:38.83 | 1:37.97 | 1.50% | 0.63% |
From the field of the four initial possibilities, two stood out: the 100 Freestyle and the 200 Breast. Dressel broke (his own) previous record by 1.14%, a higher percentage break than any other swim of the night, while Licon won his event over the second-place finisher by the largest margin, 2.98%. This narrowed down the options for most impressive swim to these two, with one metric in favor of Licon and Dressel each. To take things one step farther, I decided on a few more metrics that might indicate statistical impressiveness. I looked into the percent by which Dressel and Licon out-swam the rest of their respective A-final fields, analyzed how the field in which they swam stacked up to the fastest field in history, and created a model for each event through history to look at deviation from best-fit logistic/exponential curves.
To compare the record-breaking swims to the rest of the field in which they happened, I took an average of the times clocked by the swimmers placing 2nd-7th in each respective event and comparing it percentage-wise to Licon/Dressel’s time. The results are contained in Table 2, below.
Table 2. A-final field breakdown. | ||
Event | 100 Free | 200 Breast |
Winning Time | :40.00 | 1:47.91 |
2nd Place Time | :40.95 | 1:51.22 |
3rd Place Time | :41.21 | 1:52.09 |
4th Place Time | :41.76 | 1:52.71 |
5th Place Time | :41.77 (tie) | 1:52.81 |
6th Place Time | :41.77 (tie) | 1:52.87 |
7th Place Time | :41.80 | 1:53.04 |
8th Place Time | :41.85 | 1:55.24 |
Field Average | :41.59 | 1:52.78 |
% Faster than Field | 3.82% | 4.32% |
This one seems to go in Licon’s favor, as his swim was 4.32% faster than the average of the rest of the field while Dressel sits at 3.82%. But as I looked through the results of the past ten years of the NCAA championship meet, I noticed that the 200 Breast field from 2015 was the fastest overall field in history, with an average time of 1:51.84. Licon also won the event in 2015, posting a 1:49.48 to out-touch Kevin Cordes and sweep the field average by 2.11%. For the 100 free, this year’s field was the fastest, with an average time of :41.59. Considering the fact that Dressel’s swim came in the fastest overall field in history while Licon’s did not, I felt it only fair to compare their times to the fastest fields ever in their respective events. As the 100 Free field was the fastest field in history this past year, Dressel’s time puts him at 3.82% faster than the fastest field in history, while Licon’s time has him at 3.51% faster than the fastest field in history. With both Licon and Dressel having two of the devised metrics falling in their favor, I decided to move to one more method: curve fitting.
I went to the NCAA database containing all the recorded winning times from previous years of the championship meet. The 100 free times (that were swum in a SCY pool) extend back to 1928 while the earliest 200 breast time recorded is from 1958. There are two gaps in the data at 2000 and 2004, when the NCAA championships were swum SCM. To fill in these holes, I took an average of the times from the previous and following years as to not skew the trend lines. I produced plots of the winning times vs. the year in which they were swum to get a sense of the improvement through the years. I then fit curves to each plot using non-linear least squares analysis. Figures 1 and 2, plots of the 100 Free and 200 Breast winning times throughout history (respectively) are reproduced below. Each blue dot represents the winning time from a given year while the black curve is the best-fit trend line.
Figure 1. Trend in winning time of 100 Freestyle 1928-2017.
Figure 2. Trend in winning time of 200 Breaststroke 1958-2017.
The first thing I noticed when plotting the data was the difference in curve shapes. The 100 Free data fit a logistic model while the 200 Breast data fit an exponential model most accurately. The reason for this difference in shapes could be the fact that the data for the 100 Free extends back 30 years earlier than that of the 200 Breast. If we look at the 100 Free data starting at the same year as the 200 Breast data, namely 1958, we see an exponential curve. I initially thought to truncate the 100 Free data at 1958 and only consider the exponential curves of each, but the more data contained in a model the better, so I chose to compare the two as displayed above.
When thinking about impressiveness with respect to curve fitting, the first thing that came to mind was outliers, data points that deviate significantly from the general trend of the dataset. If either Licon or Dressel put up a time that was below the trend line, i.e. more impressive than the dataset would have projected, this would indicate impressiveness. It turns out that both Licon and Dressel put up faster times than the trend line projected, so I turned to look at which one was more of an outlier. Licon’s time put him at 3.12% faster than the trend line’s projection while Dressel’s time was 3.52% faster, putting this metric in favor of Dressel.
Though impressiveness is often a matter of opinion, I attempted to see this question strictly from a data analysis point of view. The five metrics I used to measure impressiveness were chosen as a way to look at the athlete’s swim compared to other athletes in the field and compared to the history of their event. Licon’s swim stacked up most impressively against the rest of the A-final field in which he swam, swinging two metrics in his favor (percent victory over second-place finisher and percent faster than the field). But it was Dressel’s swim that compared the most impressively against history, bringing in three metrics (percent by which he broke the record, percent faster than the fastest field in history, and percent faster than the trend line’s projection). Considering the fact that Dressel has three metrics in his favor to Licon’s two, and the fact that the metrics that swung to Dressel were comparisons with all of NCAA history, as opposed to comparisons with the other swimmers from the meet, I believe Caeleb Dressel’s 100 Freestyle was the most impressive swim of the final night of the 2017 NCAA Division I Men’s Swimming and Diving Championships. But of course, that’s my quantitative opinion (if there is such a thing). What do you think?
Personally, I’d say % of Victory should give the nod to Clark Smith, not Will Licon. A matter of opinion, but beating a field which was not only evenly matched, but also included a total of four under the previous NCAA record, should be more impressive than a swimmer whose is way ahead of the field, and in a different league.
I think Dressel tracking down Schooling in the 100 fly was the most impressive swim of the meet
what about the women? Simone’s 100 free has got to be up there.
It would be super interesting to run the same analysis on the women’s meet, and open the analysis up to include all sessions. I only focused on the final night of the men’s meet, but seeing the trends for all the events from both the women’s and men’s meet would be sweet.
Look at the Y axis for the 200 breast time vs year, I don’t know how it goes from 2:52-1:26. Hopefully just a labeling problem because that looks like a beautiful graph.
Both graphs look as if we are approaching limits
It’s for sure a labeling issue, formatting got skewed somewhere in the process. The y-axis labels should go from 2:25.88 down to 1:47.00 (from top to bottom of the Y axis).
Follow-up question: what data point deviates the most from the fitted curve (on both the logistic and exponential models)? In other words which year’s winning swim has the greatest residual value? Curious to know what the “most impressive” 100 free and 200 breast swims in history are.
I dig the analysis. Can’t think of too much else to really do here. Last year on this site I saw someone create predictive models for each event at olympic trials. The two variables it used to predict a swimmer’s time were their long course times since September of the previous year and their seed time on the psych sheet. I want to do the same thing for NCAAs, but I would… Read more »
I believe that would likely be Natalie Coughlin’s 100 back in 2002. I don’t have records to support, but from what I heard she was the first person to break 52 in 2001, and then broke 50 in 2002. Assuming no one else broke 52 throughout that year, that would make her the fastest in history by over 2 seconds in a 100.
Then again, Ledecky being the fastest in the mile by 33 seconds (2 seconds per hundred) doesn’t seem so unrealistic.
Interesting follow-up question! I took a look and interestingly enough, both Dressel and Licon own the second and third most ‘impressive’ deviations from their events’ respective trend lines. I looked back and found the top 5 most ‘impressive’ swims in each event:
For the 100 Free:
1) Alan Ford, 1944, :49.7, 4.29% deviation (in an era when most were going ~:52)
2) Caeleb Dressel, 2017, :40.00, 3.52% deviation
3) Caeleb Dressel, 2016, :40.46, 2.43% deviation
4) Rowdy Gaines, 1981, :42.38, 2.25% deviation
5) Matt Biondi, 1985, :41.87, 2.23% deviation
For the 200 Breast:
1) Steve Lundquist, 1981, 1:55.01, 3.14% deviation
2) Will Licon, 2017, 1:47.91, 3.12% deviation
3) Will Licon, 2016, 1:48.12,… Read more »