如何基于同一行中的值移动data.table行中的值

时间:2019-05-08 21:04:19

标签: r data.table

我有一个数据表,如:

structure(list(level = c(1, 2, 1, 3, 1, 1), step_destination_step_1 = c(3105, 
2689, 1610, 4897, 129, 161), step_destination_step_2 = c(2689, 
3201, 6730, 3105, 2689, 673), step_destination_step_3 = c(2945, 
NA, NA, 1057, 2945, NA), step_destination_step_4 = c(NA, NA, 
NA, NA, 3201, NA)), row.names = c(NA, -6L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x000001a52ad81ef0>)

这看起来像:

   level step_destination_step_1 step_destination_step_2 step_destination_step_3
1:     1                    3105                    2689                    2945
2:     2                    2689                    3201                      NA
3:     1                    1610                    6730                      NA
4:     3                    4897                    3105                    1057
5:     1                     129                    2689                    2945
6:     1                     161                     673                      NA
   step_destination_step_4
1:                      NA
2:                      NA
3:                      NA
4:                      NA
5:                    3201
6:                      NA

我想将step_destination_step_*列中的值移动level - 1。这将需要在data.table

中至少添加一些新列

每当发生向右移动时,我想在数字值的左侧添加NA值。

因此,输出结果可能类似于:

   level_1 level_2 level_3 level_4 level_5 level_6
1:    3105    2689    2945      NA      NA      NA
2:      NA    2689    3201      NA      NA      NA
3:    1610    6730      NA      NA      NA      NA
4:      NA      NA    4897    3105    1057      NA
5:     129    2689    2945    3201      NA      NA
6:     161     673      NA      NA      NA      NA

我可以通过编写for循环来实现此结果,这绝对不是正确的方法:

# create a placeholder data.table:
hold = data.table(
  level_1 = as.double(rep(NA, 6)), level_2 = as.double(rep(NA, 6)),
  level_3 = as.double(rep(NA, 6)), level_4 = as.double(rep(NA, 6)),
  level_5 = as.double(rep(NA, 6)), level_6 = as.double(rep(NA, 6))
  )

# loop over every row of the tables:

for (i in 1:6)
{
  hold[i, (test_out_2[i, level]):(test_out_2[i, level] + 3)] = test_out_2[i, 2:5]
}

test_out_2是原始data.table的名称(只需将顶部提供的dput的输出分配给它)

2 个答案:

答案 0 :(得分:2)

一种可能的方法:

library(data.table)
#convert into long format
mDT <- melt(setDT(DT)[, rn:=.I], id.vars=c("rn", "level"))

#pivot into desired output
dcast(
    #pad the front with NA depending on level
    mDT[, .(lvl=c(rep(NA_integer_, level[1L]-1L), value)), by=.(rn)],
    rn ~ rowid(rn),
    value.var="lvl")[, -"rn"]

输出:

      1    2    3    4    5  6
1: 3105 2689 2945   NA   NA NA
2:   NA 2689 3201   NA   NA NA
3: 1610 6730   NA   NA   NA NA
4:   NA   NA 4897 3105 1057 NA
5:  129 2689 2945 3201   NA NA
6:  161  673   NA   NA   NA NA

数据:

DT <- structure(list(level = c(1, 2, 1, 3, 1, 1), step_destination_step_1 = c(3105,
    2689, 1610, 4897, 129, 161), step_destination_step_2 = c(2689,
        3201, 6730, 3105, 2689, 673), step_destination_step_3 = c(2945,
            NA, NA, 1057, 2945, NA), step_destination_step_4 = c(NA, NA,
                NA, NA, 3201, NA)), row.names = c(NA, -6L), class = c("data.table",
                    "data.frame"))

答案 1 :(得分:0)

您可以在R为基中进行

nlvls <- 6L
test <- t(apply(
      DT, 
      1, 
      function(x) {
        out <- rep(NA_real_, nlvls)
        input <- x[-1][!is.na(x[-1])]
        out[seq_along(input) + x[1] - 1L] <- input
        out
      }))
test

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3105 2689 2945   NA   NA   NA
[2,]   NA 2689 3201   NA   NA   NA
[3,] 1610 6730   NA   NA   NA   NA
[4,]   NA   NA 4897 3105 1057   NA
[5,]  129 2689 2945 3201   NA   NA
[6,]  161  673   NA   NA   NA   NA

然后玩data.table

DT[, c(rep(NA_real_, .SD[["level"]] - 1L), unlist(.SD)[-1]), by = .(row = seq_len(nrow(DT)))
   ][, dcast(.SD, row ~ paste0("level_", rowid(row)), value.var = "V1")]


   row level_1 level_2 level_3 level_4 level_5 level_6
1:   1    3105    2689    2945      NA      NA      NA
2:   2      NA    2689    3201      NA      NA      NA
3:   3    1610    6730      NA      NA      NA      NA
4:   4      NA      NA    4897    3105    1057      NA
5:   5     129    2689    2945    3201      NA      NA
6:   6     161     673      NA      NA      NA      NA