Question

我的数据如下：

    score        temp
1 a.score  0.05502011
2 b.score  0.02484594
3 c.score -0.07183767
4 d.score -0.06932274
5 e.score -0.15512460

我想根据从最负面到最正面的值对相同的数据进行排序，取决于前4位。我尝试：

> topfour.values <- apply(temp.df, 2, function(xx)head(sort(xx), 4, na.rm = TRUE, decreasing = FALSE))
> topfour.names  <- apply(temp.df, 2, function(xx)head(names(sort(xx)), 4, na.rm = TRUE))
> topfour        <- rbind(topfour.names, topfour.values)

我得到了

> topfour.values
                        temp[, 1]                           
    d.score              "-0.06932274"            
    c.score              "-0.0718376680"          
    e.score              "-0.1551246"             
    b.score              " 0.02484594"

这是什么顺序？我做错了什么，如何正确排序？

我尝试了方法==＆＃34;快速＆＃34;和方法==＆＃34; Shell＆＃34;作为选项，但订单仍然没有意义。

Answer 1

我相信您的数据类型错误。知道如何将数据导入R将是有用的。在上面的示例中，您处理的是字符向量而不是数字向量。

head(with(df, df[order(temp), ]), 4)
    score        temp
5 e.score -0.15512460
3 c.score -0.07183767
4 d.score -0.06932274
2 b.score  0.02484594

采用Greg Snow提出的方法，并考虑到你只对顶值的向量感兴趣，并且在这种情况下不可能使用partial参数，对比较顺序和sorl.list显示差异可能无关紧要，即使对于1e7大小的载体也是如此。

df1 <- data.frame(temp = rnorm(1e+7),
                  score = sample(letters, 1e+7, rep = T))

library(microbenchmark)
microbenchmark(
  head(with(df1, df1[order(temp), 1]), 4),
  head(with(df1, df1[sort.list(temp), 1]), 4),
  head(df1[order(df1$temp), 1], 4),
  head(df1[sort.list(df1$temp), 1], 4),
  times = 1L
  )

Unit: seconds
                                        expr      min       lq   median       uq      max neval
     head(with(df1, df1[order(temp), 1]), 4) 13.42581 13.42581 13.42581 13.42581 13.42581     1
 head(with(df1, df1[sort.list(temp), 1]), 4) 13.80256 13.80256 13.80256 13.80256 13.80256     1
            head(df1[order(df1$temp), 1], 4) 13.88580 13.88580 13.88580 13.88580 13.88580     1
        head(df1[sort.list(df1$temp), 1], 4) 13.13579 13.13579 13.13579 13.13579 13.13579     1

Answer 2

有几个问题，其中一些问题已经在评论中讨论过了，但我还没有提到的一个大问题是apply函数对矩阵起作用，因此将数据框转换为矩阵在做任何事之前。由于您的数据同时具有因子和数字变量，因此数字将转换为字符串，并且对字符串表示进行排序，而不是数值。使用直接使用数据框（和列表）的工具可以防止这种情况，也可以使用order并完全避免使用apply。

此外，如果您只想要$ n $最大值或最小值，那么您可以使用sort.list代替订单并指定partial参数来加快速度。

排序不在R中正确排序数字

2 个答案: