计算具有特定条件的值

时间:2017-12-14 00:02:26

标签: r data.table

我正在尝试创建用于计算先前行中某个值的变量。因此对于第3行中的count_a,我需要计算第1~第3行中的“a”数。像这样我想创建count_a, count_b,count_c,cound_d,count_e(如果var1的唯一值是c(a,b,c,d,e)

数据:

var1     count_a     count_b     count_c ...
  a          0          0          0
  a          1          0          0
  b          2          0          0
  b          2          1          0
  c          2          2          0
  a          2          2          1
  d          3          2          1
  e          3          2          1

这是数据代码

我想在setDT(data)中使用data.table函数来实现此功能。

3 个答案:

答案 0 :(得分:1)

使用cumsum的解决方案:

# OPs data
foo <- c("a", "a", "b", "b", "c", "a", "d", "e")

# Use cumsum to get cumulative sum
# Using dummy variable to get first count as 0
sapply(unique(foo), function(x) cumsum(c("dummy", foo) == x))
#      a b c d e
# [1,] 0 0 0 0 0
# [2,] 1 0 0 0 0
# [3,] 2 0 0 0 0
# [4,] 2 1 0 0 0
# [5,] 2 2 0 0 0
# [6,] 2 2 1 0 0
# [7,] 3 2 1 0 0
# [8,] 3 2 1 1 0
# [9,] 3 2 1 1 1

# Use data.table to join everything (as wanted by OP)
library(data.table)
result <- data.table(foo, 
                     sapply(unique(foo), function(x) cumsum(c("dummy", foo) == x)))
setnames(result, c("var1", paste0("count_", unique(foo))))

答案 1 :(得分:1)

由于OP明确要求data.table解决方案,这里有两种略有不同的方法。请注意,这些是PoGibas' sapply() solution)的替代实现:

library(data.table)
CJ(var1, unique(var1), sorted = FALSE)[
  , cnt := cumsum(shift(V1, fill = "") == V2), by = V2][
    , dcast(.SD, rowid(V2) ~ V2)][, V2 := var1][]
   V2 a b c d e
1:  a 0 0 0 0 0
2:  a 1 0 0 0 0
3:  b 2 0 0 0 0
4:  b 2 1 0 0 0
5:  c 2 2 0 0 0
6:  a 2 2 1 0 0
7:  d 3 2 1 0 0
8:  e 3 2 1 1 0
CJ(unique(var1), var1, sorted = FALSE)[
  , cnt := cumsum(V1 == shift(V2, fill = "")), by = rleid(V1)][
    , dcast(.SD, rowid(V1) ~ V1)][, V1 := var1][]


   V1 a b c d e
1:  a 0 0 0 0 0
2:  a 1 0 0 0 0
3:  b 2 0 0 0 0
4:  b 2 1 0 0 0
5:  c 2 2 0 0 0
6:  a 2 2 1 0 0
7:  d 3 2 1 0 0
8:  e 3 2 1 1 0

我也尝试应用this answer to another question of the OP中使用的方法,但需要大量抛光才能获得所需的结果,这里:

DT <- data.table(var1)
DT[, rn := .I][DT, on = .(rn < rn), by = .EACHI, .SD[, .(N = .N), by = var1]][
  , dcast(.SD, rn ~ var1, fill = 0)][DT, on = "rn"]
   rn a b c d NA var1
1:  1 0 0 0 0  1    a
2:  2 1 0 0 0  0    a
3:  3 2 0 0 0  0    b
4:  4 2 1 0 0  0    b
5:  5 2 2 0 0  0    c
6:  6 2 2 1 0  0    a
7:  7 3 2 1 0  0    d
8:  8 3 2 1 1  0    e

答案 2 :(得分:0)

count_a = cumsum(var1 == "a")
count_a
  [1] 1 2 2 2 2 3 3 3

这符合第3行&#34; count_a,我需要计算&#34; a&#34;在第1行〜第3行&#34;,但这与您的示例中的不同。