我有一个数据表,如:
structure(list(level = c(1, 2, 1, 3, 1, 1), step_destination_step_1 = c(3105,
2689, 1610, 4897, 129, 161), step_destination_step_2 = c(2689,
3201, 6730, 3105, 2689, 673), step_destination_step_3 = c(2945,
NA, NA, 1057, 2945, NA), step_destination_step_4 = c(NA, NA,
NA, NA, 3201, NA)), row.names = c(NA, -6L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x000001a52ad81ef0>)
这看起来像:
level step_destination_step_1 step_destination_step_2 step_destination_step_3
1: 1 3105 2689 2945
2: 2 2689 3201 NA
3: 1 1610 6730 NA
4: 3 4897 3105 1057
5: 1 129 2689 2945
6: 1 161 673 NA
step_destination_step_4
1: NA
2: NA
3: NA
4: NA
5: 3201
6: NA
我想将step_destination_step_*
列中的值移动level - 1
。这将需要在data.table
每当发生向右移动时,我想在数字值的左侧添加NA
值。
因此,输出结果可能类似于:
level_1 level_2 level_3 level_4 level_5 level_6
1: 3105 2689 2945 NA NA NA
2: NA 2689 3201 NA NA NA
3: 1610 6730 NA NA NA NA
4: NA NA 4897 3105 1057 NA
5: 129 2689 2945 3201 NA NA
6: 161 673 NA NA NA NA
我可以通过编写for循环来实现此结果,这绝对不是正确的方法:
# create a placeholder data.table:
hold = data.table(
level_1 = as.double(rep(NA, 6)), level_2 = as.double(rep(NA, 6)),
level_3 = as.double(rep(NA, 6)), level_4 = as.double(rep(NA, 6)),
level_5 = as.double(rep(NA, 6)), level_6 = as.double(rep(NA, 6))
)
# loop over every row of the tables:
for (i in 1:6)
{
hold[i, (test_out_2[i, level]):(test_out_2[i, level] + 3)] = test_out_2[i, 2:5]
}
test_out_2
是原始data.table
的名称(只需将顶部提供的dput
的输出分配给它)
答案 0 :(得分:2)
一种可能的方法:
library(data.table)
#convert into long format
mDT <- melt(setDT(DT)[, rn:=.I], id.vars=c("rn", "level"))
#pivot into desired output
dcast(
#pad the front with NA depending on level
mDT[, .(lvl=c(rep(NA_integer_, level[1L]-1L), value)), by=.(rn)],
rn ~ rowid(rn),
value.var="lvl")[, -"rn"]
输出:
1 2 3 4 5 6
1: 3105 2689 2945 NA NA NA
2: NA 2689 3201 NA NA NA
3: 1610 6730 NA NA NA NA
4: NA NA 4897 3105 1057 NA
5: 129 2689 2945 3201 NA NA
6: 161 673 NA NA NA NA
数据:
DT <- structure(list(level = c(1, 2, 1, 3, 1, 1), step_destination_step_1 = c(3105,
2689, 1610, 4897, 129, 161), step_destination_step_2 = c(2689,
3201, 6730, 3105, 2689, 673), step_destination_step_3 = c(2945,
NA, NA, 1057, 2945, NA), step_destination_step_4 = c(NA, NA,
NA, NA, 3201, NA)), row.names = c(NA, -6L), class = c("data.table",
"data.frame"))
答案 1 :(得分:0)
您可以在R
为基中进行
nlvls <- 6L
test <- t(apply(
DT,
1,
function(x) {
out <- rep(NA_real_, nlvls)
input <- x[-1][!is.na(x[-1])]
out[seq_along(input) + x[1] - 1L] <- input
out
}))
test
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3105 2689 2945 NA NA NA
[2,] NA 2689 3201 NA NA NA
[3,] 1610 6730 NA NA NA NA
[4,] NA NA 4897 3105 1057 NA
[5,] 129 2689 2945 3201 NA NA
[6,] 161 673 NA NA NA NA
然后玩data.table
:
DT[, c(rep(NA_real_, .SD[["level"]] - 1L), unlist(.SD)[-1]), by = .(row = seq_len(nrow(DT)))
][, dcast(.SD, row ~ paste0("level_", rowid(row)), value.var = "V1")]
row level_1 level_2 level_3 level_4 level_5 level_6
1: 1 3105 2689 2945 NA NA NA
2: 2 NA 2689 3201 NA NA NA
3: 3 1610 6730 NA NA NA NA
4: 4 NA NA 4897 3105 1057 NA
5: 5 129 2689 2945 3201 NA NA
6: 6 161 673 NA NA NA NA