R Data Structures 3: List, Data Frames, and Factors
Discussing 3 common data structures useful for data professionals
Here is a link to the last 2 R data structures posts, so you can get caught up.
1 List
In R, a list is an object which can be used for holding a number of different elements in side it. This means that a list can hold several different objects inside it, that are not related. For example, you can have a list which has the following:
another list
data table
vector
function
a value
function
1.1 Basic Code
You can use the list() function in order to create a list. After that, simply add in all of the elements you want this list to have.
This list has 5 elements. The first is the value 5. The second is the boolean TRUE. The 3rd is the vector: 4,9,7. The 4th is the function sin. The 5th is the string ‘hi there’
You can also use the function typeof() in order to find out whether something is a list:
If you ever want to find out the number of elements in your list, you can use the length() function on your list:
Lastly, if you ever want to re-assign an element of your list to something else, you can use indexing.
Here is an example of me changing the 4th element from the function sin, to the string ‘watch’
2 - Data.Frames
Data Frames are just tables. You can have different types of data in different columns, for example, the first column could be numeric, while the second one could be a string, and so on…
2.1 Code
In order to make a data frame, you can use the data.frame() function. Here’s an example:
When you load up a table from your desktop to you R environment, you’ll typically use functions like: read.csv(), read.table(), etc… These functions will read the table on those files, and automatically convert the table to a data.frame, and load it into your memory
If you wanted to access all of the data on a specific column, you can either use the $ sign, or use indexing.
Here’s an example:
Now, R by default comes with several different datasets ready to go, one of these datasets is about trees. You can load this practice dataset by using the keyword: trees
Now, if you wanted to get a snapshot of the first 5 rows, you can use the head() function on your data.frame
Now, if you wanted R to access a specific (column, row), this is the format to use: [rows, columns]. For example, if I wanted R to access the first column, and row, I’d do this: [1,1]. Similarily, if I wanted R to access the first 2 columns, and the 3rd row, I can do this: [3, c(1,2)].
Now, if you wanted to do any subsetting, you just need to put your condition in the row side. For example, if I wanted to get the rows where girth is less than 10:
If you ever want to drop an entire column, simply assign that column the value of NULL.
Note, in this section, we discussed something called data.frames. In the real world, almost no one uses this. This is because in R, we have a library called data.table. When you convert a table from a data.frame to a data.table, all of the data processing becomes much faster.
The good news is that the syntax for data.frames and data.tables is almost identical.
3 - Factors
Factors are objects used for the purpose of categorizing data, and storing them under levels. Factors can be used for the storage of both strings & integers. Factors are only useful in columns which have a limited number of unique values. They are good in data analysis and statistical modelling.
3.1 Code
In order to create a factor, just use the factor() function, and throw in a vector as an input. You can also use the is.factor() method to check if a data object is a factor.
That’s mostly it for factors. In the real world, the only time they come in handy is that when you send a table to a Machine Learning algorithm in R, the code will automatically convert all of the levels in your factor column into dummy variables for you.
In other words, it saves you a miniscule amount of time when running a ML algorithm on your data. Besides that, people typically like to use the character data type (strings), instead of factors.
Hi Raptor, just became a paid sub. How do I join the discord?