Central limit theorem

Central limit theorem is one of the most fundamental theorems in probability and statistics. The theorem states that sampling distribution of the mean of any independent random variables approaches normal as the sample size increases under certain conditions. Below I created a Shiny application to visualize central limit theorem in effect. Random samples are generated from a selected population distribution to visually assess the distribution of their means against the theoretical asymptotic normal distribution.



Github page: https://github.com/mjmoon/cltdemo

Shiny app: https://micbon.shinyapps.io/cltdemo/

CLT Demo

Control Panel

On the right sidebar panel are controls for the application. On the top half of the panel, there are controls for the population distribution. One can select the type of the distribution and specify the parameters. Currently, the following distributions are available with the following ranges for the parameters.

  • Normal: \text{N} (\mu, \sigma^2) \text{; } \mu \in [-1.5, 1.5] \text{ and } \sigma \in [0.1, 2]
  • Beta: \text{Beta} (\alpha, \beta) \text{; } \alpha \in [0.5, 5] \text{ and } \beta \in [0.5, 5]
  • Binomial: \text{Binom} (n, p) \text{; } n \in [1, 50] \text{ and } p \in [0.05, 0.95]

On the bottom half of the panel, the number of simulations (m) and the sample size per simulation (n) are specified. m means are calculated from n samples each simulated from the population distribution specified above. The Submit at the bottom generates new data based on the specifications.

Population Density

On the main panel, there are three plots. On the left bottom corner, the population density specified in the control is plotted and updated as the user changes the specification. The x-axis is fixed for each type of distribution to visualize the effect of change in each parameter.

Sample Means and Density

The top plot in the main panel shows sample means, their kernel density, and the asymptotic density based on the central limit theorem. The plot is updated with the new data when the user clicks the Submit button on the control panel.

n simulated sample means are represented with scattered transparent dots along the middle of the plot. The kernel density is calculated using R’s density estimation function with default Gaussian estimation.

The asymptotic distribution is plotted using the central limit theorem.

\frac{ \sum{x_i} }{n} \xrightarrow{d} \text{N}(\mu, \sigma^2)

The asymptotic mean (\mu) and variance (\sigma^2) are calculated based on the specified population distributions.

Q-Q Plot

A Q-Q plot is shown on the right bottom corner of the main panel. The plot helps the user to visually assess the normality of the simulated sample means. This plot is also updated each time the user clicks the Submit button.

Application

Below is the application embedded. I hope it can help studying the effect of central limit theorem and changes in different parameters. It’s best accessed on a separate window/tab.

Leave a Reply