我的样本数据如下:
dt1 <- setDT(structure(list(V1 = c(301L, 301L, 301L, 301L, 301L), V2 = 1:5,
V3 = c(61950L, 61951L, 61953L, 155220L, 155218L), V4 = c("i",
"you", "you", "we", "they"), V5 = c("believe", "think", "are",
"laugh", "smile"), V6 = c("we", "they", "okay", "490", "490"
), V7 = c("can", "500", "with", "31", "31"), V8 = c("use",
"32", "that", "", ""), V9 = c("datatable", "", "500", "",
""), V10 = c("always", "", "32", "", ""), V11 = c("500",
"", "", "", ""), V12 = c("32", "", "", "", "")), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11",
"V12"), row.names = c(NA, -5L), class = "data.frame"))
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1: 301 1 61950 i believe we can use datatable always 500 32
2: 301 2 61951 you think they 500 32
3: 301 3 61953 you are okay with that 500 32
4: 301 4 155220 we laugh 490 31
5: 301 5 155218 they smile 490 31
我希望它转换为以下内容:
V1 V2 V3 newcol1 newcol2 newcol3
1: 301 1 61950 I believe we can use datatable always 500 32
2: 301 2 61951 you think they 500 32
3: 301 3 61953 you are okay with that 500 32
4: 301 4 155220 we laugh 490 31
5: 301 5 155218 they smile 490 31
机制:
请仅建议data.table解决方案。
答案 0 :(得分:4)
你可以做到
rowid_vars = c("V1","V2","V3")
melt(dt1, id=rowid_vars)[value!="", .(
nc1 = paste(value[-(.N-1:0)], collapse=" "),
nc2 = as.integer(value[.N-1]),
nc3 = as.integer(value[.N])
), by=rowid_vars]
V1 V2 V3 nc1 nc2 nc3
1: 301 1 61950 i believe we can use datatable always 500 32
2: 301 2 61951 you think they 500 32
3: 301 3 61953 you are okay with that 500 32
4: 301 4 155220 we laugh 490 31
5: 301 5 155218 they smile 490 31
我想有一些方法可以读取数据以避免这个问题,但我不知道如何。