这是问题的延长:
R data table: compare row value to group values
我现在有:
x = data.table( id=c(1,1,1,1,1,1,1,1), price = c(10, 10, 12, 12, 12, 15,
8, 11), subgroup = c(1, 1, 1, 1, 1, 1, 2, 2))
id price subgroup
1: 1 10 1
2: 1 10 1
3: 1 12 1
4: 1 12 1
5: 1 12 1
6: 1 15 1
7: 1 8 2
8: 1 11 2
并希望计算每个ID价格较低的行数,但仅计算子组1 中的行数。
如果我使用:
x[,cheaper := rank(price, ties.method="min")-1, by=id]
结果是:
> x
id price subgroup cheaper
1: 1 10 1 1 # only 1 is cheaper (row 7)
2: 1 10 1 1 # only 1 is cheaper (row 7)
3: 1 12 1 4 # 4 frows are cheaper (row 1,2,7,8)
4: 1 12 1 4 # etc
5: 1 12 1 4
6: 1 15 1 7
7: 1 8 2 0
8: 1 11 2 3
但我希望结果如下:
> x
id price subgroup cheaper_in_subgroup_1
1: 1 10 1 0 # nobody in subgroup 1 is cheaper
2: 1 10 1 0 # nobody in subgroup 1 is cheaper
3: 1 12 1 2 # only row 1 and 2 are cheaper in subgroup 1
4: 1 12 1 2
5: 1 12 1 2
6: 1 15 1 5
7: 1 8 2 0 # nobody in subgroup 1 is cheaper
8: 1 11 2 2 # only row 1 and 2 are cheaper in subgroup 1
答案 0 :(得分:2)
实现这一目标可能还有更多data.table
方法,但此处尝试在每个vapply
中使用id
x[, cheaper := vapply(price,
function(x) sum(price[subgroup == 1L] < x),
FUN.VALUE = integer(1L)),
by = id]
x
# id price subgroup cheaper
# 1: 1 10 1 0
# 2: 1 10 1 0
# 3: 1 12 1 2
# 4: 1 12 1 2
# 5: 1 12 1 2
# 6: 1 15 1 5
# 7: 1 8 2 0
# 8: 1 11 2 2
答案 1 :(得分:2)
这是使用滚动连接的小技巧的另一种方式:
y = x[subgroup==1L, .N, keyby=.(id, price+1L)][, N := cumsum(N)][]
# id price N
# 1: 1 11 2
# 2: 1 13 5
# 3: 1 16 6
x[, cheaper := y[x, N, roll=TRUE, rollends=FALSE, on=c("id", "price")]]
# id price subgroup cheaper
# 1: 1 10 1 NA
# 2: 1 10 1 NA
# 3: 1 12 1 2
# 4: 1 12 1 2
# 5: 1 12 1 2
# 6: 1 15 1 5
# 7: 1 8 2 NA
# 8: 1 11 2 2
我们的想法是获取每个id,price
的累积总和,但将其存储为price+1L
。这将导致x
中的值在执行滚动连接时获得与上次观察相对应的计数。
PS:如果price
不是整数类型,那么在获取price * (1 + eps)
时它就是price + 1L
而不是y