我有data.table
看起来像:
beginTime endTime duration data splits
1: 1497902556 1497902618 62 data chunk 1 24,38
2: 1497902556 1497902618 92 data chunk 2 24,60, 8
3: 1497902556 1497902618 62 data chunk 3 24,38
4: 1497902556 1497902618 62 data chunk 4 24,38
5: 1497902556 1497902618 62 data chunk 5 24,38
6: 1497902556 1497902618 62 data chunk 6 24,38
7: 1497902556 1497902618 62 data chunk 7 24,38
8: 1497902556 1497902618 62 data chunk 8 24,38
9: 1497902556 1497902618 62 data chunk 9 24,38
10: 1497902556 1497902618 62 data chunk 10 24,38
我希望转换为以下内容:
beginTime endTime duration data splits
1: 1497902556 1497902580 62 data chunk 1 24
2: 1497902580 1497902618 62 data chunk 1 38
3: 1497902556 1497902580 62 data chunk 2 24
4: 1497902580 1497902640 62 data chunk 2 60
5: 1497902640 1497902648 62 data chunk 2 8
6: 1497902556 1497902580 62 data chunk 3 24
7: 1497902580 1497902618 62 data chunk 3 38
8: 1497902556 1497902580 62 data chunk 4 24
9: 1497902580 1497902618 62 data chunk 4 38
10: 1497902556 1497902580 62 data chunk 5 24
11: 1497902580 1497902618 62 data chunk 5 38
...
也就是说,我希望在行之间分配拆分的各个部分,并根据需要调整开始和结束时间。
对于给定的输入行,我需要做的是(伪代码):
function(row) {
beginTime = row$beginTime
for (split in row$splits) {
endTime = beginTime + split
emitRow(beginTime, endTime, row$duration, row$data, split)
beginTime = endTime
}
}
(emitRow
是一个只创建一个新行的函数 - 对于我的示例数据中的每一行,此函数将每次调用emitRow两次)
我认为我从melt
开始,但我似乎无法说服它拆分splits
列,现在我有点陷入困境。通过sum(unlist(Map(length,t$splits)))
分割后,我可以获得所需的总行数。
关于下一步的任何建议?
更新1:可能重复Split comma-separated column into separate rows
我可以使用unnest
将我的数据放入多行,但它并不像我想要的那样更新beginTime
和endTime
。
> t
beginTime endTime durationInSeconds splits measures
1: 1497902556 1497902618 62 24,38 1.000000,0.013000,0.013000,0.013000,0.013000,0.000169
2: 1497902556 1497902648 92 24,60, 8 1.000000e+00,5.632000e+06,5.632000e+06,5.632000e+06,5.632000e+06,3.171942e+13
3: 1497902556 1497902618 62 24,38 1, 697, 697, 697, 697,485809
4: 1497902556 1497902618 62 24,38 1,0,0,0,0,0
5: 1497902556 1497902618 62 24,38 1.000000e+00,9.165802e+11,9.165802e+11,9.165802e+11,9.165802e+11,8.401193e+23
6: 1497902556 1497902618 62 24,38 1.0000,0.3100,0.3100,0.3100,0.3100,0.0961
7: 1497902556 1497902618 62 24,38 1,0,0,0,0,0
8: 1497902556 1497902618 62 24,38 1,0,0,0,0,0
9: 1497902556 1497902618 62 24,38 1.0000000,0.8166667,0.8166667,0.8166667,0.8166667,0.6669444
10: 1497902556 1497902618 62 24,38 1.000000e+00,2.958333e-03,2.958333e-03,2.958333e-03,2.958333e-03,8.751736e-06
> t2 = data.table(t %>% unnest(splits, .drop=FALSE))
> t2
beginTime endTime durationInSeconds measures splits
1: 1497902556 1497902618 62 1.000000,0.013000,0.013000,0.013000,0.013000,0.000169 24
2: 1497902556 1497902618 62 1.000000,0.013000,0.013000,0.013000,0.013000,0.000169 38
3: 1497902556 1497902648 92 1.000000e+00,5.632000e+06,5.632000e+06,5.632000e+06,5.632000e+06,3.171942e+13 24
4: 1497902556 1497902648 92 1.000000e+00,5.632000e+06,5.632000e+06,5.632000e+06,5.632000e+06,3.171942e+13 60
5: 1497902556 1497902648 92 1.000000e+00,5.632000e+06,5.632000e+06,5.632000e+06,5.632000e+06,3.171942e+13 8
6: 1497902556 1497902618 62 1, 697, 697, 697, 697,485809 24
7: 1497902556 1497902618 62 1, 697, 697, 697, 697,485809 38
8: 1497902556 1497902618 62 1,0,0,0,0,0 24
9: 1497902556 1497902618 62 1,0,0,0,0,0 38
10: 1497902556 1497902618 62 1.000000e+00,9.165802e+11,9.165802e+11,9.165802e+11,9.165802e+11,8.401193e+23 24
11: 1497902556 1497902618 62 1.000000e+00,9.165802e+11,9.165802e+11,9.165802e+11,9.165802e+11,8.401193e+23 38
12: 1497902556 1497902618 62 1.0000,0.3100,0.3100,0.3100,0.3100,0.0961 24
13: 1497902556 1497902618 62 1.0000,0.3100,0.3100,0.3100,0.3100,0.0961 38
14: 1497902556 1497902618 62 1,0,0,0,0,0 24
15: 1497902556 1497902618 62 1,0,0,0,0,0 38
16: 1497902556 1497902618 62 1,0,0,0,0,0 24
17: 1497902556 1497902618 62 1,0,0,0,0,0 38
18: 1497902556 1497902618 62 1.0000000,0.8166667,0.8166667,0.8166667,0.8166667,0.6669444 24
19: 1497902556 1497902618 62 1.0000000,0.8166667,0.8166667,0.8166667,0.8166667,0.6669444 38
20: 1497902556 1497902618 62 1.000000e+00,2.958333e-03,2.958333e-03,2.958333e-03,2.958333e-03,8.751736e-06 24
21: 1497902556 1497902618 62 1.000000e+00,2.958333e-03,2.958333e-03,2.958333e-03,2.958333e-03,8.751736e-06 38
更新2:可重现的数据样本
我没有""之前的数据,只是之前的数据:
> dput(t)
structure(list(beginTime = c(1497902556L, 1497902556L, 1497902556L,
1497902556L, 1497902556L, 1497902556L, 1497902556L, 1497902556L,
1497902556L, 1497902556L), endTime = c(1497902618L, 1497902648L,
1497902618L, 1497902618L, 1497902618L, 1497902618L, 1497902618L,
1497902618L, 1497902618L, 1497902618L), durationInSeconds = c(62L,
92L, 62L, 62L, 62L, 62L, 62L, 62L, 62L, 62L), splits = list(c(24,
38), c(24, 60, 8), c(24, 38), c(24, 38), c(24, 38), c(24, 38),
c(24, 38), c(24, 38), c(24, 38), c(24, 38)), measures = list(
c(1, 0.013, 0.013, 0.013, 0.013, 0.000169), c(1, 5632000,
5632000, 5632000, 5632000, 3.1719424e+13), c(1, 697, 697,
697, 697, 485809), c(1, 0, 0, 0, 0, 0), c(1, 916580200000,
916580200000, 916580200000, 916580200000, 8.401193e+23),
c(1, 0.31, 0.31, 0.31, 0.31, 0.0961), c(1, 0, 0, 0, 0, 0),
c(1, 0, 0, 0, 0, 0), c(1, 0.81666666, 0.81666666, 0.81666666,
0.81666666, 0.66694444), c(1, 0.0029583334, 0.0029583334,
0.0029583334, 0.0029583334, 8.7517365e-06))), .Names = c("beginTime",
"endTime", "durationInSeconds", "splits", "measures"), class = c("data.table",
"data.frame"), row.names = c(NA, -10L), .internal.selfref = <pointer: 0x105008578>)
答案 0 :(得分:2)
一种选择是:
library(data.table)
dt2 <- dt[rep(1:nrow(dt), times = sapply(strsplit(dt$splits,","),length))][,splits2 := as.integer(unlist(strsplit(dt$splits,",")))]
dt3 <- dt2[,c("beginTime","endTime") := list(beginTime + shift(splits2,fill=0L), beginTime + splits2 + shift(splits2,fill=0L)), by = .(data)]
head(dt3)
# beginTime endTime duration data splits splits2
#1: 1497902556 1497902580 62 data chunk 1 24,38 24
#2: 1497902580 1497902618 62 data chunk 1 24,38 38
#3: 1497902556 1497902580 62 data chunk 2 24,38 24
#4: 1497902580 1497902618 62 data chunk 2 24,38 38
#5: 1497902556 1497902580 62 data chunk 3 24,38 24
#6: 1497902580 1497902618 62 data chunk 3 24,38 38