DESIGN PORTFOLIO

R Statistics Tutorial Site

R Statistics Tutorial Proof-of-Concept [In Progress]

Project Genesis

I was performing a number of statistical analyses for a few projects I was working on, and I wanted to perform the analyses using R. There are dozens upon dozens of blogs that provide tutorial assistance for R, but what I noticed was that none of them provided comprehensive help, and all of them seemed to presume that I — the reader — knew more than I actually did. Further, most of these tutorials presented instruction on how to do only simple tasks, and rarely discussed how to perform a particular analysis from beginning to end. 

There are also statistics tutorial websites that explain how to perform statistical analyses, but these focus entirely on SPSS. In contrast, R is free to use and open source, and thus seems like a more useful tool for many purposes — if, and only if, there is sufficient tutorial instruction. However even the most detailed of the SPSS tutorial sites leaves the reader hanging at key moments. For example, if you use a Q-Q plot to test for normality, how "non-straight" can the line be before you conclude that your sample is not normal? Experienced researchers might know intuitively, but beginning researchers might not. These, and a few other qualms with existing resources, made me want to develop my own tutorial resources.

Target Learners

This project is targeted towards students like myself, as well as students taking statistics courses (graduate or undergraduate). I imagine that this resource can be leveraged by both students and instructors, to offer students help in using R when SPSS is less available. I presume that there are other students like myself who wish to perform basic analyses in R, but are struggling to find a comprehensive tutorial resource. So I decided to create a tutorial for at least a couple statistical tests, and to then perform "market testing" and see if these would actually be useful to instructors and learners.

Objectives

  • To provide tutorial instruction for how to use R to perform basic statistical tests in R, targeted towards true beginners in R.
  • To provide tutorial instruction for how to perform and report basic statistical tests, including how to test for assumptions and what to do if those assumptions are violated.
  • To provide the above tutorial instruction in a way that highlights the human judgment required in statistical analyses, and to showcase statistics as a form of argumentation.

Philosophy/Approach

I think it is vital to not just teach students how to run a statistical test, but how to recognize when the assumptions of a test have been violated. Statistical inquiry is more just a series of math problems — it constructing an argument. And if we have not taken into account the assumptions of the statistical tests we are using, our resulting argument may very well be fallacious. In the social sciences, the argument is usually that two groups differ in some way, or that one group differs across time in some way, or that two sets of values vary similarly with each other, etc. But how do we come to these conclusions? This should be at the front of the minds of anyone performing a statistical analysis. We should know the arguments we are making, and the premises they are based upon.

In addition, I believe that tutorial instruction should have multiple modes available to learners — text instruction, video instruction, and demonstrations (where reasonable). I also think that video tutorials should be short and to-the-point (I personally hate having to fast-forward through a long video to get help with a particular feature that I'm struggling with). I also find it helpful to have the ability to "check off" parts of a tutorial I've completed or no longer have trouble with, so that I can more easily highlight what I need help with. Thus, I have created this prototype with multi-modal instruction, a modular tutorial structure, and the ability to track your progress through the tutorial.

Tools Used

In order to create live, updating examples of normal and non-normal samples — to help illustrate visually when a sample might violate normality when being tested in multiple ways — I taught myself Javascript and used D3.js to create random data sets, to visualize the data in histograms and Q-Q plots, and to run a Shapiro-wilk test. I meant to do the same thing to illustrate tests for heterogeneity of variances, but D3.js did not have quite the tools I needed. I therefore used an outside web app to create box plots, and imported the plots as images. I used Camtasia to create tutorial videos. The website itself was created using a combination of Webflow and manual HTML, CSS, and jQuery coding.

Progress

So far, I have created a prototype/draft of one statistical test: The independent samples t-test. Before I perform testing with undergraduate statistics learners, I will need to create tutorial materials for at least 2-3 more statistical tests, and I will also need to create tutorial instruction for installing R Studio. Also, the tutorial instruction for the independent samples t-test is still incomplete. I need to do a bit more thought on how to explain the various different kinds of t-tests, and under what conditions to perform a one-tailed or two-tailed t-test. However, what I have created can serve as a helpful prototype or "proof-of-concept," so that others can see what I'm working on.