Matrices and Data Frames in R
By: Karthik Janar in data-science Tutorials on 2018-05-01
In this tutorial, we"ll cover matrices and data frames. Both represent 'rectangular" data types, meaning that they are used to store tabular data, with rows and columns. The main difference, as you'll see, is that matrices can only contain a single class of data, while data frames can consist of many different classes of data.
Let's create a vector containing the numbers 1 through 20 using the :
operator. Store the result in a variable called my_vector. You learned about the :
operator in the tutorial on sequences. If you wanted to create a vector containing the numbers 1, 2, and 3 (in that order), you could use either c(1, 2, 3) or 1:3. In this case, we want the numbers 1 through 20 stored in a variable called my_vector. Also, remember that you don't need the c() function when using :
.
my_vector <- 1:20
my_vector
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
The dim() function tells us the 'dimensions" of an object. What happens if we do dim(my_vector)?
dim(my_vector)
## NULL
Clearly, that's not very helpful! Since my_vector is a vector, it doesn't have a dim
attribute (so it's just NULL), but we can find its length using the length() function.
length(my_vector)
## [1] 20
What happens if we give my_vector a dim
attribute? Let's give it a try.
dim(my_vector) <- c(4,5)
The dim() function allows you to get OR set the dim
attribute for an R object. In this case, we assigned the value c(4, 5) to the dim
attribute of my_vector.
Use dim(my_vector) to confirm that we"ve set the dim
attribute correctly.
dim(my_vector)
## [1] 4 5
Another way to see this is by calling the attributes() function on my_vector. Try it now.
attributes(my_vector)
## $dim
## [1] 4 5
When dealing with a 2-dimensional object (think rectangular table), the first number is the number of rows and the second is the number of columns. Therefore, we just gave my_vector 4 rows and 5 columns.
But, wait! That doesn't sound like a vector any more. Well, it's not. Now it's a matrix. View the contents of my_vector now to see what it looks like.
my_vector
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 5 9 13 17
## [2,] 2 6 10 14 18
## [3,] 3 7 11 15 19
## [4,] 4 8 12 16 20
Now, let's confirm it's actually a matrix by using the class() function.
class(my_vector)
## [1] "matrix"
Sure enough, my_vector is now a matrix. We should store it in a new variable that helps us remember what it is. Store the value of my_vector in a new variable called my_matrix.
my_matrix <- my_vector
The example that we"ve used so far was meant to illustrate the point that a matrix is simply an atomic vector with a dimension attribute. A more direct method of creating the same matrix uses the matrix() function.
If your RStudio is open, bring up the help file for the matrix() function now using the ?
function.
#?matrix
Now, look at the documentation for the matrix function and see if you can figure out how to create a matrix containing the same numbers (1-20) and dimensions (4 rows, 5 columns) by calling the matrix() function. Store the result in a variable called my_matrix2.
my_matrix2 <- matrix(data=1:20, nrow=4, ncol=5)
Finally, let's confirm that my_matrix and my_matrix2 are actually identical. The identical() function will tell us if its first two arguments are the same. Try it out.
identical(my_matrix,my_matrix2)
## [1] TRUE
Now, imagine that the numbers in our table represent some measurements from a clinical experiment, where each row represents one patient and each column represents one variable for which measurements were taken.
We may want to label the rows, so that we know which numbers belong to each patient in the experiment. One way to do this is to add a column to the matrix, which contains the names of all four people.
Let's start by creating a character vector containing the names of our patients - Bill, Gina, Kelly, and Sean. Remember that double quotes tell R that something is a character string. Store the result in a variable called patients.
patients <- c("Andy", "Bob", "Charles", "Danny")
Now we"ll use the cbind() function to 'combine columns". Don't worry about storing the result in a new variable. Just call cbind() with two arguments - the patients vector and my_matrix.
cbind(patients, my_matrix)
## patients
## [1,] "Andy" "1" "5" "9" "13" "17"
## [2,] "Bob" "2" "6" "10" "14" "18"
## [3,] "Charles" "3" "7" "11" "15" "19"
## [4,] "Danny" "4" "8" "12" "16" "20"
Something is fishy about our result! It appears that combining the character vector with our matrix of numberscaused everything to be enclosed in double quotes. This means we"re left with a matrix of character strings, which is no good.
If you remember back to the beginning of this tutorial, we saw that matrices can only contain ONE class of data. Therefore, when we tried to combine a character vector with a numeric matrix, R was forced to 'coerce" the numbers to characters, hence the double quotes.
This is called 'implicit coercion", because we didn't ask for it. It just happened.
So, we"re still left with the question of how to include the names of our patients in the table without destroying the integrity of our numeric data. Thats where the data.frame comes into picture.
my_data <- data.frame(patients, my_matrix)
my_data
## patients X1 X2 X3 X4 X5
## 1 Andy 1 5 9 13 17
## 2 Bob 2 6 10 14 18
## 3 Charles 3 7 11 15 19
## 4 Danny 4 8 12 16 20
It looks like the data.frame() function allowed us to store our character vector of names right alongside our matrix of numbers. That's exactly what we were hoping for!
Behind the scenes, the data.frame() function takes any number of arguments and returns a single object of class data.frame
that is composed of the original objects.
Let's confirm this by calling the class() function on our newly created data frame.
class(my_data)
## [1] "data.frame"
It's also possible to assign names to the individual rows and columns of a data frame, which presents another possible way of determining which row of values in our table belongs to each patient.
However, since we"ve already solved that problem, let's solve a different problem by assigning names to the columns of our data frame so that we know what type of measurement each column represents.
Since we have six columns (including patient names), we"ll need to first create a vector containing one element for each column. Create a character vector called cnames that contains the following values (in order) - "patient", "age", "weight", "bp", "rating", "test".
cnames <- c("patient", "age", "weight", "bp", "rating", "test")
Now, use the colnames() function to set the colnames
attribute for our data frame. This is similar to the way we used the dim() function earlier in this tutorial.
colnames(my_data) <- cnames
Print the contents of my_data.
my_data
## patient age weight bp rating test
## 1 Andy 1 5 9 13 17
## 2 Bob 2 6 10 14 18
## 3 Charles 3 7 11 15 19
## 4 Danny 4 8 12 16 20
In this tutorial, you learned the basics of working with two very important and common data structures - matrices and data frames.
Add Comment
This policy contains information about your privacy. By posting, you are declaring that you understand this policy:
- Your name, rating, website address, town, country, state and comment will be publicly displayed if entered.
- Aside from the data entered into these form fields, other stored data about your comment will include:
- Your IP address (not displayed)
- The time/date of your submission (displayed)
- Your email address will not be shared. It is collected for only two reasons:
- Administrative purposes, should a need to contact you arise.
- To inform you of new comments, should you subscribe to receive notifications.
- A cookie may be set on your computer. This is used to remember your inputs. It will expire by itself.
This policy is subject to change at any time and without notice.
These terms and conditions contain rules about posting comments. By submitting a comment, you are declaring that you agree with these rules:
- Although the administrator will attempt to moderate comments, it is impossible for every comment to have been moderated at any given time.
- You acknowledge that all comments express the views and opinions of the original author and not those of the administrator.
- You agree not to post any material which is knowingly false, obscene, hateful, threatening, harassing or invasive of a person's privacy.
- The administrator has the right to edit, move or remove any comment for any reason and without notice.
Failure to comply with these rules may result in being banned from submitting further comments.
These terms and conditions are subject to change at any time and without notice.
- Data Science
- Android
- React Native
- AJAX
- ASP.net
- C
- C++
- C#
- Cocoa
- Cloud Computing
- HTML5
- Java
- Javascript
- JSF
- JSP
- J2ME
- Java Beans
- EJB
- JDBC
- Linux
- Mac OS X
- iPhone
- MySQL
- Office 365
- Perl
- PHP
- Python
- Ruby
- VB.net
- Hibernate
- Struts
- SAP
- Trends
- Tech Reviews
- WebServices
- XML
- Certification
- Interview
Comments