Applications of Chi-Square Test – Sports Analytics

Posted by: Dilip D | Business Analyitics

Introduction :

A Chi-square test is any statistical hypothesis test where the sampling distribution of the test statistics is a chi-squared distribution when the null hypothesis is true. Chi-square is a very versatile test used both as a non-parametric and a parametric measure. Chi-square test is an approximate test for large values of n.

Conditions:

For the validity of chi-square test of ‘goodness of fit’ between theory and experiment, the following conditions must be satisfied,

i) The sample observation should be independent.

ii) Constraints on the cell frequencies , if any , should be linear

iii) N, the total frequency should be reasonably large, say, greater than 50.

iv) No theoretical cell frequency should be less than 5.

Applications of Chi-square Distribution:

Chi-square distribution has a large number of applications in Statistics, some of which are enumerated below:

i) To test if the hypothetical value of the population variance is ϭ2 = ϭ02

ii) To test the ‘goodness of fit’.

iii) To test the independence of attributes.

iv) To test the homogeneity of independent estimates of the population variance.

v) To combine various probabilities obtained from independent experiments to give a single test of significance.

vi) To test the homogeneity of independent estimates of the population correlation coefficient.

Example: 1 ( Sports Analytics)

Chi-square can be used in Sports analytics as an example of validating a belief or hypothesis using analytics. It can also be used in a business analytics or business statistics course as an application of the Chi-square goodness of fit

Techniques and concepts:

Hypothesis testing, Contingency table, Chi square goodness of fit.

TENNIS:

Tennis is played by millions of recreational players and is also a popular worldwide spectator sport. The four Slam tournaments (also referred to as the Majors) are especially popular: the Australian Open(19905) played on hard courts, the French Open (1891) played on red clay courts, Wimbledon(1877) played on grass courts, and the US Open(1881) also played on hard courts.

TYPE OF COURT SURFACE:

A variety of surfaces can be used to create a tennis court, each with its own characteristics which affect the playing style of the game. There are four main types of courts depending on the materials used for the court surface: clay courts, hard courts, grass courts and carpet courts.

Hard Court:

Hard court made up the most common tennis-court surfaces. Theses courts played faster than clay courts but slower than grass courts. Hard courts also offered a more predictable bounce. Due to the fast nature of these courts, players commonly tried to keep the length of their rallies to a minimum.

Grass court :

Grass courts were the fastest of all court surfaces. Due to the slippery surface of the grass, the ball was inclined to bounce low and skid across the surface. The fast nature of the surface encouraged players to put a quick end to each point.

Clay Court:

Clay courts constituted the slowest courts because their surface absorbed much of the bounce and reduced the forward motion of the ball. Due to the slow nature of the surface , rallies were lengthy and required great athleticism on the part of the player. These courts favoured defensive style players who engaged in long rallies from the baseline.

Chi square Analysis:

In general, we presume that winning or losing any tennis match is completely based on the coach, the practice and fitness of the player. But logically the type of court they play has significant impact in winning or losing the match. To understand that, Chi square test is performed between the players and different surfaces. The results clearly tell us the type of court makes the difference.

Wins and Losses on different court surfaces

Based on above data, describe a hypothesis test to determine whether each player’s performance depends on the type of surface.

Null Hypothesis (H0) : The type of court surface does not make a differences in Roger federer performance.

Alternative Hypothesis (H1) : The type of court surface does make a differences in Roger federer performance.

Chi-Square Statistics:

Inference:

  • Federer performs better on Grass while Nadal performs better on clay.
  • For Djokovic and Murray , there is not enough evidence to support the hypothesis that the type of surface makes a difference in their performance.
  • Federer is better on Grass , his clay court record ( 83%) is still better than Djokovic (77%) and Murray(69%)

Conclusion:

The key takeaway is that although the type of surface does not make a difference to the performance of Djokivic and Murray, they are not necessarily more versatile than Federer.

Note :

The chi square test can also be applied in the following areas like,

  • Credit worthiness of borrowers for personal loans and their age group.
  • Association between training received and performance of sales men.
  • Returns on an individual stock and return on stocks of a sector stocks like Bank,Phamaceutical, Information Technology, etc.
  • Salary level of Employees and level of job satisfaction
  • Attitude ( Bearish, Neutral, Bullish) towards stock market and age of investors.
  • Impact of a TV campaign and category of viewers viz. Urban/ Metropolitan, semiurban and Rural

Source :

1. Harvard Business case.

2. Statistics for Management by Levin and Rubin