Question

I have a data set with millions of rows. The first row has an ID, though there are repeated IDs in the data set (all IDs are grouped and ordered). The data set has multiple columns. I would like to transform the data such that there is one row item per ID, and all the previous entries of the columns for the ID are put into a single row, in order.

See example snippet of the before data

What the data currently looks like

And example of what I would like the data to look like What the entries should look like

Here is an example of a very similar problem, however in this problem the data only has two columns (one column for the ID), but my data has over 5 columns (and one column for ID): Collapse mutiple rows of a dataframe into one row - based on a unique key

I would like to do this in either R or Excel :)

Answer 1

In R, we can do this with dcast from data.table

library(data.table)
dcast(setDT(df1), ID ~ rowid(ID), value.var = c("V1", "V2"), fill = "")
#   ID V1_1 V1_2 V1_3 V2_1 V2_2 V2_3
#1:  1    a    b    c   aa   bb   cc
#2:  2    d    e        dd   ee     
#3:  3    f             ff

data

df1 <- structure(list(ID = c(1, 1, 1, 2, 2, 3), V1 = c("a", "b", "c", 
 "d", "e", "f"), V2 = c("aa", "bb", "cc", "dd", "ee", "ff")), .Names = c("ID", 
"V1", "V2"), row.names = c(NA, -6L), class = "data.frame")

collapse multiple rows of a dataframe into single row based on unique id in r

1 个答案:

data