Question

我有一个数据框，其中包含数千个分类的统计量度。该数据具有许多不同的类。我输出先由类排序的数据帧，然后输出度量：

statia[order(statia$class, -statia$MCC, -statia$ROC_Area, -statia$F.Measure),]

是否有一种简单的方法可以修改命令，这样我就不会获得所有行，但每个类的前n行，是否具有最高MMC值的n个？

修改

正如评论中所批评的那样，我试图想出一个例子。我希望它有助于防止未来的混乱。然而，@ beginneR的回答是我正在寻找的。我将来会尝试让我的问题更清楚。

1              file           class     MCC ROC_Area F.measure
2          run1.txt     Iris-setosa    0.98        0         1
3          run1.txt Iris-versicolor    0.92     0.06     0.885
4          run1.txt  Iris-virginica     0.9     0.04     0.918
5          run1.txt   Weighted_Avg.   0.933    0.033     0.934
6          run3.txt     Iris-setosa       1        0         1
7          run3.txt Iris-versicolor       1        0         1
8          run3.txt  Iris-virginica       1        0         1
9          run3.txt   Weighted_Avg.       1        0         1
10         [...]

我想要的是（在解决方案中更好地展示）：示例：每个类中具有最高MCC值的3个样本：

1              file           class     MCC ROC_Area F.measure
2          run3.txt     Iris-setosa       1        0         1
3          run1.txt     Iris-setosa    0.98        0         1
4          run5.txt     Iris-setosa    0.60        0         1
5          run3.txt Iris-versicolor       1        0         1
6          run1.txt Iris-versicolor    0.92     0.06     0.885
7          [...]

Answer 1

使用dplyr即可：

library(dplyr)
statiaNew <- statia %>%
    group_by(class) %>%
    arrange(class, desc(MCC), desc(ROC_Area), desc(F.Measure)) %>%
    do(head(., 10))    # to show the first 10 rows per class

更新

如果您更喜欢基础R替代方案：

statiaNew <- do.call(rbind, lapply(split(statia, statia$class), function(x){
  head(x[with(x, order(class, -MCC, -ROC_Area, -F.Measure)),], 10) #return first 10 rows
}))

或者

statia <- statia[order(statia$class, -statia$MCC, -statia$ROC_Area, -statia$F.Measure),]
statiaNew  <- statia[ave(statia$class, statia$class, FUN = seq_along) <= 10, ]

更新2：以下是这些方法对iris数据集执行的方式：

library(dplyr)
iris %>%
  group_by(Species) %>%
  arrange(Species, desc(Sepal.Length), desc(Sepal.Width), desc(Petal.Length)) %>%
  do(head(., 3))    

#Source: local data frame [9 x 5]
#Groups: Species
#
#  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#1          5.8         4.0          1.2         0.2     setosa
#2          5.7         4.4          1.5         0.4     setosa
#3          5.7         3.8          1.7         0.3     setosa
#4          7.0         3.2          4.7         1.4 versicolor
#5          6.9         3.1          4.9         1.5 versicolor
#6          6.8         2.8          4.8         1.4 versicolor
#7          7.9         3.8          6.4         2.0  virginica
#8          7.7         3.8          6.7         2.2  virginica
#9          7.7         3.0          6.1         2.3  virginica

do.call(rbind, lapply(split(iris, iris$Species), function(x){
  head(x[with(x, order(Species, -Sepal.Length, -Sepal.Width, -Petal.Length)),], 3)
}))

#              Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#setosa.15              5.8         4.0          1.2         0.2     setosa
#setosa.16              5.7         4.4          1.5         0.4     setosa
#setosa.19              5.7         3.8          1.7         0.3     setosa
#versicolor.51          7.0         3.2          4.7         1.4 versicolor
#versicolor.53          6.9         3.1          4.9         1.5 versicolor
#versicolor.77          6.8         2.8          4.8         1.4 versicolor
#virginica.132          7.9         3.8          6.4         2.0  virginica
#virginica.118          7.7         3.8          6.7         2.2  virginica
#virginica.136          7.7         3.0          6.1         2.3  virginica

iris <- iris[with(iris, order(Species, -Sepal.Length, -Sepal.Width, -Petal.Length)),]
iris[ave(as.numeric(iris$Species), iris$Species, FUN = seq_along) <= 3, ]

#note that I used `as.numeric(iris$Species)` because it's stored as `factor`s and would cause an error otherwise.

#    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#15           5.8         4.0          1.2         0.2     setosa
#16           5.7         4.4          1.5         0.4     setosa
#19           5.7         3.8          1.7         0.3     setosa
#51           7.0         3.2          4.7         1.4 versicolor
#53           6.9         3.1          4.9         1.5 versicolor
#77           6.8         2.8          4.8         1.4 versicolor
#132          7.9         3.8          6.4         2.0  virginica
#118          7.7         3.8          6.7         2.2  virginica
#136          7.7         3.0          6.1         2.3  virginica

按组排序数据只输出前n个值

1 个答案: