Crashing the Dance uses artificial intelligence techniques to predict how the NCAA Men's Basketball Committee will select and seed teams for the tournament. Over the last five seasons, Crashing the Dance has successfully predicted 92% of the at-large teams (an average of just over 31 of 34 per year) and 77% of seeds within one of their actual seed line. For the 2011 tournament, we correctly picked 34 of 37 at-large selections and 56 of 68 (82%) within one seed line of their actual seed. This marks the third straight season where CTD outperformed the average bracketologist. Browse the latest results by clicking the links above or read on for more details about how we do it.
Each spring, bracketologists around the country do their best to trump the NCAA Men's Basketball Committee in selecting, seeding, and filling out the bracket for the NCAA Division I Men's Basketball Championship (a.k.a., the Big Dance). They pore over the same "nitty gritty" reports seen by the committee, analyzing RPI, polls, wins in the last 10 games, and conference performance.
The committee comprises a rotating set of athletic directors and conference commissioners. Each year, several members move out and several others take their place. This creates continuity in the body building the bracket, presumably leading to continuity in the process itself. The principles and procedures (PDF) themselves are also fairly well defined (not to mention analyzed and simulated). However, the committee's deliberations are kept highly secret. This makes it difficult to know whether they weigh certain factors more than others.
Hmm... so we have known input (team information) and output data (the selected teams and their assigned seeds), unknown process (the committee's deliberation) to create the output from the input - this sounds like a classic supervised machine learning problem!
Our approach applies statistical machine learning (a form of artificial intelligence) techniques to understand how the committee selects the 37 at-large teams and determines the 1-68 seeding (formally known as the S-Curve), and attempt to predict their efforts. Others have applied simple statistical models (e.g., linear regression) to identify the at-large selections, but we are aware of neither the use of more advanced machine learning techniques nor automated efforts to predict the S-Curve.
Unfortunately, the committee's output (i.e., the bracket) is not a 100% accurate representation of their deliberations. We will discuss these and other problems as we go along. However, we believe that because of the committee's consistency in applying their principles and procedures to selecting and seeding, our approach can be as accurate on average as any human bracketologist.
Crashing the Dance is run by Andy Cox and based on work done by Andy and Yushi Jing while computer science graduate students at Georgia Tech. Thanks to Jerry Palm for past RPI data and Ken Pomeroy for game data.