This tutorial continues on from Introduction to R - Part 1.
- TMS_data.txt (right click and choose save link as..)
Make sure your working directory in R is set to the folder contaiing these files.
Download the script for this tutorial:
It is possible to extend the functionality of R through the use of packages.
There are 2 ways to install a new package, you can either use the menu (Tools > Install Packages) or you can use the
install.packages() function. Luckily the pckage we need is already installed so we can skip this step.
Once you’ve installed a package you need to load it into the R workspace. You can do this by using the
In order to import the SPSS data file we have we need a package called foreign. You can load it by running either of the following commands:
# Either library(foreign) # Or require(foreign)
This will make all of the functions from the package available to you, in this case we’re interested in the
read.spss() function to load the SPSS data into R.
You’ll see that the last command showed that the SPSS data we just imported is stored in a list variable. This is fine for some data but we want this variable to be in a data frame.
You can convert it using the
dat2<-data.frame(dat2) str(dat2) # Check to make sure the structure type has changed
Once it’s converted you can start looking at the data. Let’s start by checking that it looks reasonable.
tail() to look at the first or last few rows of data, and
names() to look at the names of the columns in the dataset. You can also use
View() to open up the full dataset in the script pane.
head(dat2) # Look at the first few rows tail(dat2) # Look at the last few rows names(dat2) # Check the column names View(dat2) # Open dataset in script pane
That looks good, although the column names could be a bit more informative. Let’s rename the HEIGHT column to HeightCM so we know the units of measurement.
You can do this by assigning the new value to the 2nd position of
names(dat2), like so:
names(dat2) <- "HeightCM" names(dat2)
Actually, all of the column names could be improved, lets rename them all:
Hopefully, when you look at these examples you will see that we are calling a function on dat2, which uses the round brackets (), and also selecting data from a specific position, using the square brackets .
The colon is used to indicate a sequence, so 1:4 is the same as 1, 2, 3, 4.
Let’s dive a little deeper into exploring data frames.
Start by making sure that the TMS_Data.txt file is loaded into the workspace. Just in case you don’t remember, the code for that is:
dat1 <- read.delim("TMS_data.txt")
Let’s do some of the same checks we did on the last dataset:
str(dat1) # Show the structure of the variable, is it a dataframe? names(dat1) # Show the names of the columns, are they understandable? head(dat1) # Show the first few lines of data, do they look sensible? dim(dat1) # Show the dimensions of the data (Rows, Columns)
You can also explore specific variables in the data but to do this we need to revisit indexing.
Previously we have seen that you can select a specific element in a variable by using square brackets to indicate its position.
temp <- c (1, 3, 5, 6, 17, 8) temp
But a data frame is more complex than a vector. There are several ways to index in data frames. You can still use square brackets but you include 2 numbers: the first indicates the Row and the 2nd the Column.
dat1[2,6] # Shows the element in Row 2, Column 3 dat1[456,9] # Shows the element in Row 465, Column 9 # You can select multiple sequential elements using the colon symbol: dat1[1:5,9] # Shows rows 1 to 5 in column 9 dat1[5,1:3] # Shows columns 1 to 3 in row 5 dat1[1:3,3:5] # SHows columns 3 to 5 in rows 1 to 3 # You can also leave one of the numbers out and R will show the all rows or columns: dat1[50,] # Shows all columns in Row 50 dat1[,6] # Shows all rows in column 6 dat1[1:5,] # Shows all columns for rows 1 to 5
If you have a little difficulty remembering the order for indexing, Roman Catholic works as a simple mnemonic.
Because a data frame has column names, you can also use them indicate which columns from the data you’d like to select:
# Either using square brackets: dat1[45,"RT"] # Show Row 45 in the column called RT dat1[,"Axes"] # Show all rows in the column called "Axes" # Or by using the $ symbol: dat1$Hemisphere # Show all the rows in the column called "Hemisphere" # You can even combine the two ways: dat1$Congruence # Show row number 57 in the column called "Congruence"
The mean can be calculated using the
mean() function. If you’re using it on a data frame be sure to select the column you’d like to like to run the function on:
mean(dat1) # This will produce an error mean(dat1$RT) # This will calculate the mean RT
Some data will have missing entries (usually indicated by NA), and this can confuse some functions:
##  NA
In order to deal with this
mean() has an optional input called na.rm, if you set this to true then it will ignore the missing values.
##  2.226583
The standard deviation can be calculated using the
sd(dat1$RT) # As before if a variable has missing values then 'na.rm' must be set to true sd(dat1$Twitches, na.rm = TRUE)
All of your favorite descriptive statistics are available in R, including:
- Interquartile range:
You can also use the
summary() function to calculate a number of these simultaneously.
Right, let’s get to some proper statistics!
For this section we’re going to look at the data in dat1 to see if there is a difference in reaction time between congruent and incongruent stimuli. So that calls for a paired sample T-test.
You can run a T-test using the
t.test() function but how do we do that without all sort of nonsense spliting up dat1 into the 2 conditions?
Introducing the tilde: ~. The tilde is used to generate formula for statisitical tests.
A simple rule of thumb is that the dependant variable should be placed on the left side of the tilde and the independent variable(s) should be placed on the right side of the tilde. In our analysis RT is the dependant variable and Congruence is the independant variable, so we should use the formula: RT ~ Congruence.
You’ll see this pop-up more frequently when using more complex statistical tests, so it’s good to get your head around it now.
But if we just execute
t.test(RT ~ Congruence) R will give us an error. It can’t find ‘RT’ or ‘Congruence’, so we need to tell it where to find them by using the optional argument ‘data =’.
t.test(RT ~ Congruence, data = dat1)
## ## Welch Two Sample t-test ## ## data: RT by Congruence ## t = -5.2736, df = 9581.3, p-value = 1.367e-07 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -0.03221094 -0.01475402 ## sample estimates: ## mean in group Cong mean in group Incong ## 0.4944810 0.5179635
Obviously a paired sample t-test is not going to be ideal for every situation, so let’s look at some of the optional arguments that can change what
t.test() does by default:
So let’s assume that we have equal variances:
t.test(RT ~ Congruence, data = dat1, var.equal = TRUE)