我有以下数据表:
require(data.table)
dt1 <- data.table(ind = 1:8, cat = c("A", "A", "A", "B", "B", "C", "C", "D"), counts = (10:3))
ind cat counts
1: 1 A 10
2: 2 A 9
3: 3 A 8
4: 4 B 7
5: 5 B 6
6: 6 C 5
7: 7 C 4
8: 8 D 3
我想要实现的是为每只猫添加一行,其中计数中猫的总和(计数)与猫A的总和(计数)之间存在差异。对于这些行,ind应该是0。 基本上我想反驳以下信息:
added_info <- cbind(ind =0, dt1[, .(counts = dt1[cat == "A", sum(counts)] - sum(counts)), by = cat])
> added_info
ind cat counts
1: 0 A 0
2: 0 B 14
3: 0 C 18
4: 0 D 24
最终结果将是:
dt1 <- rbind(dt1, added_info)[order(cat)]
> dt1
ind cat counts
1: 1 A 10
2: 2 A 9
3: 3 A 8
4: 0 A 0
5: 4 B 7
6: 5 B 6
7: 0 B 14
8: 6 C 5
9: 7 C 4
10: 0 C 18
11: 8 D 3
12: 0 D 24
我的问题是,是否有更好(更短)的方法来实现这一点,使用数据表(可能使用.I或.N ??)
答案 0 :(得分:4)
你可以做到
require(data.table)
dt1 <- data.table(ind = 1:8, cat = c("A", "A", "A", "B", "B", "C", "C", "D"), counts = (10:3))
dt1[,c:=sum(counts[cat=="A"])][,.(ind=c(ind,0), counts=c(counts,c[.N]-sum(counts))),cat][]
# cat ind counts
# 1: A 1 10
# 2: A 2 9
# 3: A 3 8
# 4: A 0 0
# 5: B 4 7
# 6: B 5 6
# 7: B 0 14
# 8: C 6 5
# 9: C 7 4
# 10: C 0 18
# 11: D 8 3
# 12: D 0 24
答案 1 :(得分:1)
这可能是一个data.table调用中的解决方案:
dt1[, rbind(.SD,
data.table(ind = 0,
counts = dt1[cat == 'A', sum(counts)] - sum(.SD$counts))),
by = cat]
输出:
cat ind counts
1: A 1 10
2: A 2 9
3: A 3 8
4: A 0 0
5: B 4 7
6: B 5 6
7: B 0 14
8: C 6 5
9: C 7 4
10: C 0 18
11: D 8 3
12: D 0 24
答案 2 :(得分:0)
你说有效率,所以...这有两个by;唯一可能是矢量化的,并且sum的data.table应编译为c for循环。
> dt1[, .SD
][, ca := sum(.SD[cat == 'A', counts])
][, cc := sum(counts), cat
][, cd := ca - cc
][, rbind(.SD, unique(.SD, by=c('cat'))[, `:=`(ind=0)])
][ind == 0, counts := cd
][, .(cat, ind, counts)
][order(cat, ind)
]
cat ind counts
1: A 0 0
2: A 1 10
3: A 2 9
4: A 3 8
5: B 0 14
6: B 4 7
7: B 5 6
8: C 0 18
9: C 6 5
10: C 7 4
11: D 0 24
12: D 8 3
>