使用data.table

时间:2017-03-09 11:33:46

标签: r data.table

我有以下data.table和感兴趣的变量x。我想创建另一个变量来指示x从0到1的跳转,这意味着该变量在某一年之前一直为0,之后的所有年份中为1。这应该由id_d完成。

是否有一种简单的数据表格方式?

原始数据表:

 fullDat <- data.table(id_d = rep(letters[1:3], each=12), 
                  year=rep(1:12, 3), 
                  x = c(rep(0, 5), rep(1, 7), 0,1,0,1,2,2,4, rep(5,5), 1, rep(0, 3), rep(1, 8)))

    id_d year x
  1:    a    1 0
  2:    a    2 0
  3:    a    3 0
  4:    a    4 0
  5:    a    5 0
  6:    a    6 1
  7:    a    7 1
  8:    a    8 1
  9:    a    9 1
 10:    a   10 1
 11:    a   11 1
 12:    a   12 1
 13:    b    1 0
 14:    b    2 1
 15:    b    3 0
 16:    b    4 1
 17:    b    5 2
 18:    b    6 2
 19:    b    7 4
 20:    b    8 5
 21:    b    9 5
 22:    b   10 5
 23:    b   11 5
 24:    b   12 5
 25:    c    1 1
 26:    c    2 0
 27:    c    3 0
 28:    c    4 0
 29:    c    5 1
 30:    c    6 1
 31:    c    7 1
 32:    c    8 1
 33:    c    9 1
 34:    c   10 1
 35:    c   11 1
 36:    c   12 1
id_d year x

结果如何:

    id_d year x jump
  1:    a    1 0    0
  2:    a    2 0    0
  3:    a    3 0    0
  4:    a    4 0    0
  5:    a    5 0    0
  6:    a    6 1    1
  7:    a    7 1    0
  8:    a    8 1    0
  9:    a    9 1    0
 10:    a   10 1    0
 11:    a   11 1    0
 12:    a   12 1    0
 13:    b    1 0    0
 14:    b    2 1    0
 15:    b    3 0    0
 16:    b    4 1    0
 17:    b    5 2    0
 18:    b    6 2    0
 19:    b    7 4    0
 20:    b    8 5    0
 21:    b    9 5    0
 22:    b   10 5    0
 23:    b   11 5    0
 24:    b   12 5    0
 25:    c    1 1    0
 26:    c    2 0    0
 27:    c    3 0    0
 28:    c    4 0    0
 29:    c    5 1    0
 30:    c    6 1    0
 31:    c    7 1    0
 32:    c    8 1    0
 33:    c    9 1    0
 34:    c   10 1    0
 35:    c   11 1    0
 36:    c   12 1    0
id_d year x jump

3 个答案:

答案 0 :(得分:3)

  

该变量在某一年之前为0,在

之后的所有年份中为1
# find rows to assign one
wDT = fullDat[, .(year = year[with(rle(x), 
  if (identical(values, c(0, 1))) first(lengths) + 1L
  else 0L
)]), by=id_d]

# initialize to zero
fullDat[, jump := 0L ]
# update join to assign ones
fullDat[wDT, on=.(id_d, year), jump := 1L ]

没有必要制作中间表wDT;将完整代码写入最终语句也是有效的。事实上,如果需要,它可能都在一行中,就像...

DT[, x := 0L][code_for_wDT, on=on_cols, x := 1L]

或者,不要使用联接,只需使用.I中的行号:

# find rows to assign one
w = fullDat[, with(rle(x), .I[
  if (identical(values, c(0, 1))) first(lengths) + 1L
  else 0L
]), by=id_d]$V1

# initialize to zero
fullDat[, jump := 0L ]
# update to assign ones
fullDat[w, jump := 1L ]

答案 1 :(得分:2)

我们可以做到

fullDat[, jump := {i1 <- which.max(x)
       if(all(x[i1:.N]==1)) replace(rep(0, .N), i1, 1) else 0}, 
           id_d]
fullDat
#    id_d year x jump
# 1:    a    1 0    0
# 2:    a    2 0    0
# 3:    a    3 0    0
# 4:    a    4 0    0
# 5:    a    5 0    0
# 6:    a    6 1    1
# 7:    a    7 1    0
# 8:    a    8 1    0
# 9:    a    9 1    0
#10:    a   10 1    0
#11:    a   11 1    0
#12:    a   12 1    0
#13:    b    1 0    0
#14:    b    2 1    0
#15:    b    3 0    0
#16:    b    4 1    0
#17:    b    5 2    0
#18:    b    6 2    0
#19:    b    7 4    0
#20:    b    8 5    0
#21:    b    9 5    0
#22:    b   10 5    0
#23:    b   11 5    0
#24:    b   12 5    0
#25:    c    1 1    0
#26:    c    2 0    0
#27:    c    3 0    0
#28:    c    4 0    0
#29:    c    5 1    0
#30:    c    6 1    0
#31:    c    7 1    0
#32:    c    8 1    0
#33:    c    9 1    0
#34:    c   10 1    0
#35:    c   11 1    0
#36:    c   12 1    0

或者稍微紧凑的选项是

fullDat[, jump := if(all(cumsum(diff(x)) %in% c(0,1))) c(0, diff(x)) else 0 ,id_d]

答案 2 :(得分:0)

fullDat[, jump := (cumsum(x==0)==(1:.N - 1L)) & (rev(cumsum(rev(x==1))) == .N:1), id_d]

这是如何运作的:

  1. cumsum(x==0) == (1:.N - 1L)检查直到并包括此行的零的数量是否等于前一行的数量
  2. rev(cumsum(rev(x==1))) == .N:1检查从最后一行向后(向下到此行)计数的1的数量等于从此处到结尾的行数