我有一个像这样的数据集:
Data01 <- data.table(
code=c("A111", "A111","A111","A111","A111", "A111","A111","A234", "A234","A234","A234","A234", "A234","A234"),
x=c("",126,126,"",836,843,843,126,126,"",127,836,843,843),
y=c("",76,76,"",456,465,465,76,76,"",77,456,465,465),
no1=c(028756, 028756,028756,057756, 057756, 057756, 057756,028756, 028756,057756,057756, 057756, 057756, 057756),
no2=c("","",034756,"","","",789165,"",034756,"","","","",789165)
)
Data01[, version := paste0("V", 1:.N), by = code]
Data01[, unique_version := paste(code, version, sep = "_")]
我想要的是一种添加列的方法,该列针对每个唯一的code
条目说明每一行与上一行之间的区别是什么(即将列名粘贴到现在不同的地方)值)。像这样:
Data01[, change := c("First_entry","New_x_and_y","New_no2","New_x_and_y_and_no_1","New_x_and_y","New_x_and_y","New_no2","First_entry","New_no2","New_x_and_y_and_no1","New_x_and_y","New_x_and_y","New_x_and_y","New_no2")]
我的实际数据集有550万行和大约260万个唯一的code
条目,因此我想对此的任何解决方案都需要一些时间才能完成。因此,如果可能的话,包括某种进度指示器(如此处建议的Progress bar in data.table aggregate action)将非常有帮助。
答案 0 :(得分:1)
您可以尝试这样的事情
nm <- c("x","y","no1","no2") #names(Data01)[-1L]
Data01[, change := c("First_entry",
sapply(seq_len(.N)[-1L], function(n) {
paste(c("New",
nm[which(unlist(.SD[n-1L]) != unlist(.SD[n]))]),
collapse="_")
})),
by=.(code)]