Handicapping D1 NCAA Women’s Nationals

by Andrew Mering 31

March 13th, 2017 College, News

The women’s D1 national championship meet starts on Wednesday. We’ve scored the psych sheet and examined how performance changes from the psych sheet at the big meet. It’s now time to combine to two into a forecast of the meet.

I ran a Monte Carlo simulation of the NCAA women’s meet minus diving (most top teams have 1 or 0 divers except Minnesota, and UCLA who have 3). The exact procedure was:

  1. Modify each swim on the psych sheet swim by a random percentage based on their team’s performance history. The percentage was drawn from a normal distribution with mean and standard deviation of the team’s previous time changes at nationals (for example, Georgia mean: .1%, sd: .91%). If a team had fewer than 20 swims in the previous 7 years at nationals, the entire field’s mean of .45% and standard deviation of 1% were used.
  2. Re rank the times based off the time changes
  3. Score the meet and check the order of the teams
  4. Repeat 50,000 times

The top teams unsurprisingly remain unquestioned under this procedure. Stanford won over 99.9% of the time. California was 2nd over 99.9% of the time. There were shakeups in the the top 10. The model gives NC State, 4th on the psych sheet, almost no chance of a top 4 finish based on swimming points. Instead, 84% of the time the model has them finishing somewhere between 7th and 12th. Georgia, 6th on the psych sheet, ends up in the top 4 in 68% of simulations, and the top 5 in 91%. Texas, 5th on the psych sheet, finishes 5th or higher based on swimming points in only 16% of the model runs. Virginia, 8th on the psych sheet, is 7th or better in 72% of simulations. A table with more of the results is below.

This methodology isn’t perfect. There’s no diving. It doesn’t include a chance of relay DQ’s. It also makes an assumption that past performance at nationals is predictive of future performance. This assumption appears reasonably valid based on the year by year time changes. Teams performances are correlated one year to the next. However, this simulation doesn’t include contingencies for teams drastically changing their past behavior. For example, NC State, #5 on the psych sheet, historically added an average of .72% at nationals. If suddenly they behave like Stanford and add only .01%, the model’s prediction of a <2% chance of a top 5 finish will probably be up ended (I’m not saying anything about NC State. Just a random example). For most teams, I think the assumption of behavior consistent with their history at this meet is valid, but there will probably be a couple teams who change their approach and have a historically novel result and break out from their expected finish here.

That’s the fun part: seeing which teams break expectations and do things we didn’t (or shouldn’t) predict. This post is about setting a baseline of expectations. The story of the meet will be which expectations are defied or fulfilled.

Simulated Places

The spaces with 0% are perhaps better marked as <1%. It happened, but it rounded to 0%. The blank spaces never happened. The full table continues past 20th place and 24 teams, but I cut it off for readability/relevancy reasons.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Stanford 100% 0%
California 0% 100% 0%
Southern Cali 0% 45% 30% 18% 5% 1% 0% 0% 0% 0% 0% 0%
Georgia 0% 34% 34% 23% 7% 2% 1% 0% 0% 0% 0%
Texas A&M 21% 28% 32% 12% 5% 2% 1% 0% 0% 0% 0% 0%
Virginia 1% 4% 12% 31% 24% 13% 7% 4% 2% 1% 0% 0% 0% 0% 0% 0%
Texas 1% 4% 11% 25% 22% 14% 9% 6% 4% 2% 1% 1% 0% 0% 0% 0% 0%
Michigan 0% 0% 2% 8% 14% 17% 16% 14% 11% 9% 6% 3% 1% 0% 0% 0% 0% 0%
NC State 0% 0% 1% 5% 11% 17% 18% 16% 13% 9% 5% 2% 1% 0% 0% 0% 0%
Louisville 0% 0% 1% 4% 9% 14% 16% 16% 15% 12% 7% 4% 2% 1% 0% 0% 0% 0%
Indiana 0% 0% 0% 2% 6% 11% 15% 18% 18% 14% 9% 4% 2% 1% 0% 0% 0%
Arizona 0% 0% 0% 2% 4% 7% 9% 12% 14% 15% 14% 10% 6% 3% 2% 1% 0% 0%
Wisconsin 0% 0% 0% 1% 2% 4% 6% 9% 11% 15% 16% 14% 9% 5% 3% 2% 1% 1%
Minnesota 0% 0% 0% 1% 2% 4% 8% 13% 19% 21% 14% 9% 5% 3% 1% 1%
UNC 0% 0% 0% 0% 1% 1% 3% 6% 11% 15% 16% 15% 12% 9% 6%
Ohio St 0% 0% 0% 0% 1% 1% 3% 6% 12% 17% 17% 15% 12% 8% 5%
Tennessee 0% 0% 0% 0% 0% 0% 1% 3% 7% 12% 15% 16% 16% 13% 8%
Kentucky 0% 0% 0% 0% 1% 2% 4% 8% 13% 15% 16% 15% 11% 7%
Missouri 0% 0% 0% 0% 0% 1% 1% 3% 6% 10% 13% 16% 17% 14%
UCLA 0% 0% 0% 0% 1% 2% 4% 7% 11% 17% 19%
Auburn 0% 0% 0% 0% 0% 1% 2% 4% 6% 10% 14% 16%
Arizona St 0% 0% 0% 1% 2% 6%
Florida 0% 0% 0% 0% 0% 1% 2%
Florida St 0% 0% 0% 0% 1% 3%

31
Leave a Reply

12 Comment threads
19 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
21 Comment authors
newest oldest most voted
The Grand Inquisitor

I like this probabilistic approach – there are many useful insights about where the standings projections are firm and where teams have a lot of room to rise or fall. These calculated distributions seem to square pretty well with the magnitude of prior year movements between the psych sheet projections and actual results. It’s interesting to look at the 3rd through 5th place dynamics: although USC’s mostly likely finish is third, there’s a greater than 50% probability they will finish lower. On the other hand, TAMU appears most likely to finish 5th, but there’s almost a 50% chance they would finish higher. Only missing piece (and it’s significant) is some similar analysis done on the diving to overlay on the… Read more »

The Grand Inquisitor

If you allow for diving points, it looks like Minnesota might be the team who benefits most – perhaps moving up 5 or so places (perhaps 70 or so points?) when you add diving with a legitimate chance to finish in top 10, maybe 9th. Indiana (perhaps around 40 pts from diving) could also reasonably finish in top 10 when diving is included. For teams challenging for a podium spot, Texas probably gets the biggest boost (maybe around 30 points from diving) followed by Stanford (at least 20). TAMU and Michigan might be able to scrape out a few points as well. Doesn’t look like Cal, USC, and Georgia are poised to expect significant scoring from the diving.

Joe

This is great. Would be extra work but correlating within swimmer time drops with one another (as well as within team time drops) – so that swimmers or teams who are off on day 1 are more likely to be off on day 2 – would seem to be a logical next step. This should increase the variance of the results; i.e. Stanford would be more likely to lose because they could all be off in some simulations.

jcd<3

It’d be fascinating to see this broken down for some of the specific races, particularly the ones where swimmers are closely bunched together. Could also do a comparison of the simulated outcome of the 2016 meet vs the actual 2016 results to see how well the technique does (i.e., does this pick up Georgia’s surge, Stanford’s DQ notwithstanding).

Don't want to miss anything?

Subscribe to our newsletter and receive our latest updates!