A list, a vector, and a data frame walk into a bar...
Just like data types, all of our data objects in R also belong to data structures. We're already familiar with values and vectors, now read on to meet the rest of the family!
Values
The simplest data structure in R is that of a single value, like what get back from running:
frequently_used_number <-1/40
For all intents and purposes, R treats these values as vectors, just with only a single element! With that said, let's move along to some real vectors (sorry frequently_used_number).
Vectors
Vectors are the other data structure we have already ran into. They are the simplest way of storing multiple data elements in R.
Let's go ahead and make some vectors using the concatenate function c().
# a logical vector logicals <-c(TRUE, TRUE, FALSE)# an integer vectorintegers <-c(1:10) # the colon ":" gives us the sequence # of intergers between 1 and 10# a numeric vector doubles <- integers +0.1# here we're coercing our integer vector of# 1 to 10 into a numeric vector of 1.1 to 10.1# a character vector strings <-c("a", "f", "c", "d", "e") # remember the quotation marks# and a looooooong numeric vector, just becauselong <-seq(1, 100, 0.1) # seq will generate a vector from 1 to 1000, # in 0.1 unit steps
At its core, R is a language built around vectors. As such, there are a lot of inbuilt functions we can use to manipulate them. We'll go over a few now.
Looking into a vector
The first thing we might want to do with our vector is look at it. Let's call our long vector and see what's in it:
Jeez thats a lot of data! Fortunately we have a couple of different ways to look at parts of the vector without looking at the whole thing.
We can look at just the first few elements with head()
>head(long)[1] 1.01.11.21.31.41.5# you can also specify how many elements you want to see>head(long, 11) [1] 1.01.11.21.31.41.51.61.71.81.92.0
or just the last few elements with tail()
>tail(long)[1] 99.599.699.799.899.9100.0# you can also specify how many elements you want to see>tail(long, 11) [1] 99.099.199.299.399.499.599.699.799.899.9100.0
or pick and choose sections of the vector by subsetting using [] brackets to select the position, or index, of the elements we want.
# we can subset a single index> long[21][1] 3# or a group of indices> long[40:50][1] 4.95.05.15.25.35.45.55.65.75.85.9# we can also use the "-" sign to look at a vector without an index or indices> strings[1] "a""f""c""d""e"> strings[-2][1] "a""c""d""e"> strings[-c(2:5)][1] "a"
Note that indexing in R starts at 1. That is, if you want the first element of sample_vector you just need to type sample_vector[1]. Intuitive, right?
Modifying vectors
In addition to chopping up vectors by subsetting, we can also modify them or add to them by reassigning in the same way we'd assign a variable.
# let's fix that our strings vector by replacing that "f" with a "b"> strings[2] <-"b"> strings[1] "a""b""c""d""e"# much nicer! but i think i'd still like that "f" on the end#> strings[6] <-"f"> strings[1] "a""b""c""d""e""f"
Vectors and coercion
One limitation of vectors is that they can only contain one type of data. Let's try to make a vector with some different data types.
# if we try for a mix of data types> mixed_bag <-c(TRUE, 10, 5+3i, "far_too_many_ducks")# everything will be coerced into the most complex type>typeof(mixed_bag)[1] "character"> mixed_bag[1] "TRUE""10""5+3i""far_too_many_ducks"
No worries, if we do want to group together objects with different data types, we just need to use a different data structure, this brings us to lists...
Challenges
Challenge 1.1
# What is the data type of each of the following objects:>c(1, 2, 3)c('d', 'e', 'f') c("d", "e", "f") c(TRUE,1L,10)c("11",10,12)c("Sun","night", FALSE)
# Subsetting vectors# 1. Create a vector that ranges from 10 to 50 in steps of 3.# 2. How many elements are in the vector?# 3. What is the value of the 7th element 7 in the vector?# 4. Replace the 10th element in the vector with your favourite # number in the whole world.# 5. How can you access the last 8 elements of the vector? # Try finding two alternative ways.# 6. Advanced: what is the sum of all the elements except the # first 3 elements?
As you can see, calling a list looks quite different to calling a vector. They also show up in a new spot in the environment panel.
See that little blue circle with the white triangle on the left? Clicking on that will show us some more information about our list.
Looking into a list
Just like with vectors, we can look at the elements of a list with the head() and tail() functions and reassign them by using []brackets and indices.
# you can subset lists> mixed_bag[3:4] [[1]][1] 5+3i[[2]][1] "far_too_many_ducks"
However, see the double square brackets [[]] ? The output is stored in a list! If you want to access an element in a list without the list wrapper you need to use double square brackets [[]] to access objects in the list:
# Use double sqaure brackets to unwrap the output> mixed_bag[[4]] [1] "far_too_many_ducks"
Finally, you can modify a list as well as add elements to it using [[]]
# You can modify lists> mixed_bag[[2]] <-35# and add to them> mixed_bag[[5]] <- strings> mixed_bag[[1]][1] TRUE[[2]][1] 35[[3]][1] 5+3i[[4]][1] "far_too_many_ducks"[[5]][1] "a""b""c""d""e""f"# a vector in a list, freaky
Modifying lists
You might be saying: "Lists are great! Why would I ever want to use vectors?". Unfortunately, when using lists there are fewer calculations and manipulations you can do to them. Fortunately, it's not at all hard to turn a list into a vector - just use the unlist() function
# this works totally fine> a_vector <-c(1, 2, 3, 4, 5) > a_vector +1[1] 23456# but this does not> a_list <-list(1, 2, 3, 4, 5) +1Error in a_ list(1, 2, 3, 4, 5) +1: non-numeric argument to binary operator
Challenges
Challenge 2.1
1. Create a list with the following 4 elements:I. "I love summer"II. TRUEIII. "fun temperatures"IV. c(24,25,26)2. Access (and unwrap) the third element.3. Change the second element to FALSE4. Advanced: Change the third fun temperature to 30 (that is, change 26 to 30)
# 1> list_one =list("I love summer",TRUE,"fun temperatures",c(24,25,26))# 2> list_one[[3]][1] "fun temperatures"# 3> list_one[[2]] = FALSE# 4 (Advanced)> list_one[[4]][3] = 30> list_one[[1]][1] "I love summer"[[2]][1] FALSE[[3]][1] "fun temperatures"[[4]][1] 242530
Data Frames
The last data structure we're going to look at today is the data frame. Data frames are incredibly powerful as a means for representing tabular data.
Let's say we've a collection of different observations for a group of cats.
Rather than storing them in individual lists or vectors, we can combine them all into a data frame!
> cats <-data.frame(name, colour, weight, hates_mondays)> cats name colour weight hates_mondays1 Otis black 11FALSE2 Luna calico 8FALSE3 Puss tabby 13FALSE4 Garfield ginger 42TRUE
Here you can see our vectors have been turned into the columns of our data frame, each named after the name of the vector that we used to create them.
Like lists, data frames appear in the data section of the environment, and clicking the blue circle will show us some more information about our data frame.
Additionally, clicking on the little table icon in the top right will open a new window next to our script showing us everything in our data frame in spreadsheet style!
Looking at and modifying data frames
Just like vectors and lists, we can access and edit parts of our data frame by using [] brackets and indices. Given that data frames are 2 dimensional, we need to specify both a row index and a column index for the entry we want to modify, separated by a comma. It looks something like this [row, column].
By leaving one of these entries blank, we can instead access the entire row [row,] or column [,column] at once.
# accessing a specific element> cats[2, 1][1] "Luna"# accessing a whole column> cats[,1][1] "Otis""Luna""Puss""Garfield"# accessing a whole row> cats[4,] name colour weight hates_mondays4 Garfield ginger 42TRUE
You can also access the last few rows and the last few rows of your data frame using head(cats) and tail(cats).
We will often want to access entire columns at a time to manipulate them. In addition to accessing them via index, we can access them by name using the data_frame$column_name syntax.
# garfield is looking a little chonky> cats$weight[1] 1181342# let's put the cats on a diet> cats$weight <-sqrt(cats$weight) +5> cats$weight[1] 8.3166257.8284278.60555111.480741# unfortunately, his dislike of mondays has spread # let's update the data frame to show this> cats$hates_mondays <-TRUE# assigning a single value to a column will# change the entire column to that value> cats$ hates_mondays[1] TRUETRUETRUETRUE
Data frames as vectors and lists
You may have noticed that the operations we can do on data frames fall somewhere between what we can do with vectors, and what we can do with lists. This is no coincidence, a data frame is really just a vector/list hybrid!
In a data frame, every row is a list of attributes for an object or event (in our example, an individual cat), while every column is a vector of a specific attribute for all our data. So basically, a row is an observation and a column is a variable.
It is this vector property that makes data frames such a powerful tool for data analysis, but more on that next lesson when we jump into the tidyverse!
Challenges
R comes with some datasets. We are going to be suing the mtcars dataset for these challenges:
data_f = mtcars
Challenge 3.1
# Have a look at the data frame in your environment.# How many columns and rows does the data frame has?
# You can look directly at data_f in your environment # or type the following command:>dim(data_f)[1] 3211# 32 obseravations (rows)# 11 variables (columns)
Challenge 3.2
Access the first 3 rows of the data frame
# One option>head(data_f,3) mpg cyl disp hp drat wt qsec vs am gear carbMazda RX4 21.061601103.902.62016.460144Mazda RX4 Wag 21.061601103.902.87517.020144Datsun 71022.84108933.852.32018.611141# Another option: > data_f[1:3,] mpg cyl disp hp drat wt qsec vs am gear carbMazda RX4 21.061601103.902.62016.460144Mazda RX4 Wag 21.061601103.902.87517.020144Datsun 71022.84108933.852.32018.611141
Challenge 3.3
# 1. Access the mpg column.# 2. Change the third element of the mpg column to 33.3