在数据表中添加与特定组不同的行的有效方法

时间:2017-05-05 09:09:39

标签: r data.table

我有以下数据表:

require(data.table)
dt1 <- data.table(ind = 1:8, cat = c("A", "A", "A", "B", "B", "C", "C", "D"), counts = (10:3))

    ind cat counts
1:   1   A     10
2:   2   A      9
3:   3   A      8
4:   4   B      7
5:   5   B      6
6:   6   C      5
7:   7   C      4
8:   8   D      3

我想要实现的是为每只猫添加一行,其中计数中猫的总和(计数)与猫A的总和(计数)之间存在差异。对于这些行,ind应该是0。 基本上我想反驳以下信息:

added_info <- cbind(ind =0, dt1[, .(counts = dt1[cat == "A", sum(counts)] - sum(counts)), by = cat])

> added_info
   ind cat counts
1:   0   A      0
2:   0   B     14
3:   0   C     18
4:   0   D     24

最终结果将是:

dt1 <- rbind(dt1, added_info)[order(cat)]

> dt1
    ind cat counts
 1:   1   A     10
 2:   2   A      9
 3:   3   A      8
 4:   0   A      0
 5:   4   B      7
 6:   5   B      6
 7:   0   B     14
 8:   6   C      5
 9:   7   C      4
10:   0   C     18
11:   8   D      3
12:   0   D     24

我的问题是,是否有更好(更短)的方法来实现这一点,使用数据表(可能使用.I或.N ??)

3 个答案:

答案 0 :(得分:4)

你可以做到

require(data.table)
dt1 <- data.table(ind = 1:8, cat = c("A", "A", "A", "B", "B", "C", "C", "D"), counts = (10:3))
dt1[,c:=sum(counts[cat=="A"])][,.(ind=c(ind,0), counts=c(counts,c[.N]-sum(counts))),cat][]
#     cat ind counts
#  1:   A   1     10
#  2:   A   2      9
#  3:   A   3      8
#  4:   A   0      0
#  5:   B   4      7
#  6:   B   5      6
#  7:   B   0     14
#  8:   C   6      5
#  9:   C   7      4
# 10:   C   0     18
# 11:   D   8      3
# 12:   D   0     24

答案 1 :(得分:1)

这可能是一个data.table调用中的解决方案:

dt1[, rbind(.SD, 
            data.table(ind = 0, 
                       counts = dt1[cat == 'A', sum(counts)] - sum(.SD$counts))), 
    by = cat]

输出:

   cat ind counts
 1:   A   1     10
 2:   A   2      9
 3:   A   3      8
 4:   A   0      0
 5:   B   4      7
 6:   B   5      6
 7:   B   0     14
 8:   C   6      5
 9:   C   7      4
10:   C   0     18
11:   D   8      3
12:   D   0     24

答案 2 :(得分:0)

你说有效率,所以...这有两个by;唯一可能是矢量化的,并且sum的data.table应编译为c for循环。

> dt1[, .SD
      ][, ca := sum(.SD[cat == 'A', counts])
      ][, cc := sum(counts), cat
      ][, cd := ca - cc
      ][, rbind(.SD, unique(.SD, by=c('cat'))[, `:=`(ind=0)])
      ][ind == 0, counts := cd
      ][, .(cat, ind, counts)
      ][order(cat, ind)
      ]

    cat ind counts
 1:   A   0      0
 2:   A   1     10
 3:   A   2      9
 4:   A   3      8
 5:   B   0     14
 6:   B   4      7
 7:   B   5      6
 8:   C   0     18
 9:   C   6      5
10:   C   7      4
11:   D   0     24
12:   D   8      3
>