Central limit theorem

Central limit theorem is one of the most fundamental theorems in probability and statistics. The theorem states that sampling distribution of the mean of any independent random variables approaches normal as the sample size increases under certain conditions. Below I created a Shiny application to visualize central limit theorem in effect. Random samples are generated from a selected population distribution to visually assess the distribution of their means against the theoretical asymptotic normal distribution.

Continue reading “Central limit theorem”

Creating a game of Go using R

A few months back in March, an AI Go player developed by Google DeepMind surprised many when it won its first match against Sedol Lee, who holds the highest rank in Go. It continued to win four matches out of five winning the series. I made a presentation on AlphaGo for a reading course in data mining for my masters program after digging up articles on the background and the methods. I then continued with programming a Go simulator/game in R.

Continue reading “Creating a game of Go using R”

Predicting influenza outbreaks with Google Flu Trends: 4. Correction and comments

In previous posts, I described what my group presented at the 2016 Statistical Society of Canada Annual Meeting’s student case study poster competition. In a previous post, I discussed the group’s work on selecting a prediction model and the final conclusion, which turned out to be incorrect due to a mistake in our R codes. In this post, I will described the mistake as well as other considerations.

Continue reading “Predicting influenza outbreaks with Google Flu Trends: 4. Correction and comments”

Predicting influenza outbreaks with Google Flu Trends: 3. Prediction

As part of the 2016 Statistical Society of Canada Annual Meeting’s student case study poster competition, my group looked at the strength and timing of association between GFT estimates and reported influenza case counts as described in this post. Then, we finally built and compared multiple prediction models to predict the peak in the annual number of positive influenza tests.

Continue reading “Predicting influenza outbreaks with Google Flu Trends: 3. Prediction”

Predicting influenza outbreaks with Google Flu Trends: 2. Associations

In a previous post, I have introduced my group’s work for a student case study poster competition at the 2016 Statistical Society of Canada Annual Meeting. In this post, I will continue with a discussion on the association between GFT estimates and reported influenza case counts.

Continue reading “Predicting influenza outbreaks with Google Flu Trends: 2. Associations”

Predicting influenza outbreaks with Google Flu Trends: 1. Descriptives

At the 2016 Statistical Society of Canada Annual Meeting, I had an opportunity to compete at a student case study in data analysis poster competition with a group of classmates. The case study was on assessing the association between Google Flu Trends (GFT) estimates and reported influenza case counts as well as GFT’s predictability for influenza outbreak timing.

Continue reading “Predicting influenza outbreaks with Google Flu Trends: 1. Descriptives”

Leave-one-out Cross-validation

I wrote an R script to illustrate the leave-one-out cross-validation – LOOCV, and other prediction error estimation methods for bivariate data classifications for an inclass presentation during my masters program. I have also organized the demonstration into repeatable functions, which are available on my Github page. Below is an animated demonstration of the LOOCV method for LDA and KNN models for a simulated data set using the functions.

Continue reading “Leave-one-out Cross-validation”