anthe.sevenants

Keeping a list of data frames and combining them in R

2023-02-01

One of the things I often do in R is keeping a list of several data frames. For example, I could be sampling using different conditions, and storing the sampled data frames in a list. Once all data frames have been collected, I then want to combine all the data frames I stored in the list into a single data frame. Sounds complicated? Not at all!

First, create a vector in which you will keep all your data frames. I will call mine samples:

samples <- c()

Then, write your for loop. In my case, I will loop over a vector called "components" and do some sampling operation to retrieve a sampled data frame. How exactly the sampling works is an explanation for another time, but just assume that the function df_sample returns a sample of the df_component data frame.

Then, to add the data frame to our list of samples, we use the append function. Its first argument is the list you want to append the data frame to, the second argument is the data frame we want to append. Make sure that you assign the result of append to the same list; it does not run "in place" but rather returns a new list!

Pay attention, however, for you might have noticed that I do not append my sample data frame immediately. Instead, I wrap it in a list(). Why do we have to do this? If we leave the list() wrapper out, the data frame will be coerced into a vector of elements. In short, this means that our list will no longer contain any data frames once we are done looping, and we will not be able to combine them. Always wrap your data frame item in list() when appending!

# Loop over each component
for (component in components) {
  # Filter the data frame for that component
  df_component <- df[df$component == component]
  # Sample the required rows
  df_component_sample <- sample_df(df_component)
  # Append to list of samples
  samples <- append(samples, list(df_component_sample))
  }
}

Finally, all that is left to do is to combine all the data frames in our samples list into a big, single data frame. This can be done using a single R command:

df_sample <- do.call("rbind", samples)

And that is all! Your list of data frames should now be combined into the df_sample object.

Bonus: combine only two data frames

If you only need to combine two data frames, and you already have both in memory, you can just use rbind directly:

df <- rbind(df_a, df_b)