Question

我发现在data.table（）内部，order函数按组枚举行，而最初的想法是查看指定组内每个观察的等级。

这是一个可重复的例子：

require(data.table)
N <- 10

set.seed(1)

test <- data.table(
  a = round(rnorm(N,mean=0, sd = 30),0),
  b = c(rep('group_1', N/2 ),rep('group_2', N/2))
)
test <- test[, item_position := order(a, decreasing = T), by=list(b)]
setkey(test, b, item_position)
View(test)

结果（因为我得到了）：

test
      a       b item_position
 1:  48 group_1             1
 2: -25 group_1             2
 3:  10 group_1             3
 4: -19 group_1             4
 5:   6 group_1             5
 6:  -9 group_2             1
 7:  22 group_2             2
 8: -25 group_2             3
 9:  15 group_2             4
10:  17 group_2             5

这显然是错误的。我做错了什么，如何在data.table中使用order（）？

谢谢！

Answer 1

我认为你对order的作用有点误解。从您描述的所有内容中，您实际上都在寻找rank：

test[, B_S := rank(-a, ties.method="first"), by = b][] ## Big to Small
#       a       b B_S
#  1: -19 group_1   4
#  2:   6 group_1   3
# .. SNIP ..
#  9:  17 group_2   2
# 10:  -9 group_2   4
test[, S_B := rank(a, ties.method="first"), by = b][]  ## Small to big
#       a       b B_S S_B
#  1: -19 group_1   4   2
#  2:   6 group_1   3   3
# .. SNIP ..
#  9:  17 group_2   2   4
# 10:  -9 group_2   4   2
setkey(test, b, S_B)
test
#       a       b B_S S_B
#  1: -25 group_1   5   1
#  2: -19 group_1   4   2
#  3:   6 group_1   3   3
#  4:  10 group_1   2   4
#  5:  48 group_1   1   5
#  6: -25 group_2   5   1
#  7:  -9 group_2   4   2
#  8:  15 group_2   3   3
#  9:  17 group_2   2   4
# 10:  22 group_2   1   5

订单输出中没有错误（除了它不是您所期望的）。请考虑以下事项：

x <- c(-19, 6, -25, 48, 10)
order(x, decreasing=TRUE)
# [1] 4 5 2 1 3
cbind(x, order(x, decreasing=TRUE))
#        x  
# [1,] -19 4
# [2,]   6 5
# [3,] -25 2
# [4,]  48 1
# [5,]  10 3

这与您在data.table答案中获得的内容完全相同。要查看有关order函数的更多信息，请查看此Q和A集：Understanding the order() function

Answer 2

Ananda的解决方案是采用较小数据集的方法。对于较大的，速度成为问题，您可能希望改为使用data.table的{{1}}：

setkey

data.table函数中的订单结果错误

2 个答案: