I have a data set with millions of rows. The first row has an ID, though there are repeated IDs in the data set (all IDs are grouped and ordered). The data set has multiple columns. I would like to transform the data such that there is one row item per ID, and all the previous entries of the columns for the ID are put into a single row, in order.
See example snippet of the before data
And example of what I would like the data to look like
Here is an example of a very similar problem, however in this problem the data only has two columns (one column for the ID), but my data has over 5 columns (and one column for ID): Collapse mutiple rows of a dataframe into one row - based on a unique key
I would like to do this in either R or Excel :)
答案 0 :(得分:-1)
In R
, we can do this with dcast
from data.table
library(data.table)
dcast(setDT(df1), ID ~ rowid(ID), value.var = c("V1", "V2"), fill = "")
# ID V1_1 V1_2 V1_3 V2_1 V2_2 V2_3
#1: 1 a b c aa bb cc
#2: 2 d e dd ee
#3: 3 f ff
df1 <- structure(list(ID = c(1, 1, 1, 2, 2, 3), V1 = c("a", "b", "c",
"d", "e", "f"), V2 = c("aa", "bb", "cc", "dd", "ee", "ff")), .Names = c("ID",
"V1", "V2"), row.names = c(NA, -6L), class = "data.frame")