Question

假设我有：

x = data.table( id=c(1,1,1,2,2,2), price=c(100,110,120,200,200,220) )
> x
   id price
1:  1   100
2:  1   110
3:  1   120
4:  2   200
5:  2   200
6:  2   220

并且想要在省略当前行时为每一行找到组中最便宜的价格（by = id）。所以结果应该是这样的：

> x
   id price   cheapest_in_this_id_omitting_current_row
1:  1   100   110       # if I take this row out the cheapest is the next row
2:  1   110   100       # row 1
3:  1   120   100       # row 1
4:  2   200   200       # row 5
5:  2   200   200       # row 4 (or 5)
6:  2   220   200       # row 4 (or 5)

所以就像使用：

x[, cheapest_by_id := min(price), id]

但删除每次计算的当前行。

如果我有一个变量引用组内的当前行，如.row_nb，我会使用：

x[, min(price[-.row_nb]), id]

但是这个.row_nb似乎不存在......？

Answer 1

这是另一种方式：

x[order(price), min_other_p := c(price[2], rep(price[1], .N-1)), by = id]
# or
x[order(price), min_other_p := replace( rep(price[1], .N), 1, price[2] ), by = id]


   id price min_other_p
1:  1   100         110
2:  1   110         100
3:  1   120         100
4:  2   200         200
5:  2   200         200
6:  2   220         200

在OP的示例中，order中的i不是必需的，但一般情况下是必需的。

工作原理。我们使用order按递增顺序对价格向量进行排序，以便price[1]和price[2]是每个中最低的两个价格组。在结果中，我们希望price[1] - 整体价格最低 - 除了位置1之外的任何地方，我们想要下一个最低价格。

更加明确：假设我们已经排序，以便我们进行排序，以便i==1是一个组中价格最低的行; i==2，第二低等等。然后price[1]是组中价格向量的第一个order statistic，price[2]是价格向量的二阶统计量。很明显

# pseudocode
min(price[-i]) == price[2] if i==1, since price[2] == min(price[2:.N])
min(price[-i]) == price[1] otherwise, since price[1] belongs to price[-i] and is smallest

Answer 2

我们按'id'分组，在行的序列上使用combn，指定要选择的元素数，即'm'比行数少{1 .N-1），使用combn的输出作为数字索引，以“价格”的子集，获取min并将输出分配（:=）作为新列。

 x[,  cheapest_in_this_id_omitting_current_row:= 
             combn(.N:1, .N-1, FUN=function(i) min(price[i])), by = id]
x
#   id price cheapest_in_this_id_omitting_current_row
#1:  1   100                                      110
#2:  1   110                                      100
#3:  1   120                                      100
#4:  2   200                                      200
#5:  2   200                                      200
#6:  2   220                                      200

或者不是使用combn，我们可以遍历序列，使用它来索引“价格”，得到mean。我想这会很快。

 x[,cheapest_in_this_id_omitting_current_row:=
          unlist(lapply(1:.N, function(i) min(price[-i]))) , id]

R数据表：使用除当前行

2 个答案: