我有一个数据框,其中包含数千个分类的统计量度。该数据具有许多不同的类。我输出先由类排序的数据帧,然后输出度量:
statia[order(statia$class, -statia$MCC, -statia$ROC_Area, -statia$F.Measure),]
是否有一种简单的方法可以修改命令,这样我就不会获得所有行,但每个类的前n行,是否具有最高MMC值的n个?
修改
正如评论中所批评的那样,我试图想出一个例子。我希望它有助于防止未来的混乱。然而,@ beginneR的回答是我正在寻找的。我将来会尝试让我的问题更清楚。
1 file class MCC ROC_Area F.measure
2 run1.txt Iris-setosa 0.98 0 1
3 run1.txt Iris-versicolor 0.92 0.06 0.885
4 run1.txt Iris-virginica 0.9 0.04 0.918
5 run1.txt Weighted_Avg. 0.933 0.033 0.934
6 run3.txt Iris-setosa 1 0 1
7 run3.txt Iris-versicolor 1 0 1
8 run3.txt Iris-virginica 1 0 1
9 run3.txt Weighted_Avg. 1 0 1
10 [...]
我想要的是(在解决方案中更好地展示): 示例:每个类中具有最高MCC值的3个样本:
1 file class MCC ROC_Area F.measure
2 run3.txt Iris-setosa 1 0 1
3 run1.txt Iris-setosa 0.98 0 1
4 run5.txt Iris-setosa 0.60 0 1
5 run3.txt Iris-versicolor 1 0 1
6 run1.txt Iris-versicolor 0.92 0.06 0.885
7 [...]
答案 0 :(得分:3)
使用dplyr
即可:
library(dplyr)
statiaNew <- statia %>%
group_by(class) %>%
arrange(class, desc(MCC), desc(ROC_Area), desc(F.Measure)) %>%
do(head(., 10)) # to show the first 10 rows per class
更新
如果您更喜欢基础R替代方案:
statiaNew <- do.call(rbind, lapply(split(statia, statia$class), function(x){
head(x[with(x, order(class, -MCC, -ROC_Area, -F.Measure)),], 10) #return first 10 rows
}))
或者
statia <- statia[order(statia$class, -statia$MCC, -statia$ROC_Area, -statia$F.Measure),]
statiaNew <- statia[ave(statia$class, statia$class, FUN = seq_along) <= 10, ]
更新2:以下是这些方法对iris
数据集执行的方式:
library(dplyr)
iris %>%
group_by(Species) %>%
arrange(Species, desc(Sepal.Length), desc(Sepal.Width), desc(Petal.Length)) %>%
do(head(., 3))
#Source: local data frame [9 x 5]
#Groups: Species
#
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.8 4.0 1.2 0.2 setosa
#2 5.7 4.4 1.5 0.4 setosa
#3 5.7 3.8 1.7 0.3 setosa
#4 7.0 3.2 4.7 1.4 versicolor
#5 6.9 3.1 4.9 1.5 versicolor
#6 6.8 2.8 4.8 1.4 versicolor
#7 7.9 3.8 6.4 2.0 virginica
#8 7.7 3.8 6.7 2.2 virginica
#9 7.7 3.0 6.1 2.3 virginica
do.call(rbind, lapply(split(iris, iris$Species), function(x){
head(x[with(x, order(Species, -Sepal.Length, -Sepal.Width, -Petal.Length)),], 3)
}))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#setosa.15 5.8 4.0 1.2 0.2 setosa
#setosa.16 5.7 4.4 1.5 0.4 setosa
#setosa.19 5.7 3.8 1.7 0.3 setosa
#versicolor.51 7.0 3.2 4.7 1.4 versicolor
#versicolor.53 6.9 3.1 4.9 1.5 versicolor
#versicolor.77 6.8 2.8 4.8 1.4 versicolor
#virginica.132 7.9 3.8 6.4 2.0 virginica
#virginica.118 7.7 3.8 6.7 2.2 virginica
#virginica.136 7.7 3.0 6.1 2.3 virginica
iris <- iris[with(iris, order(Species, -Sepal.Length, -Sepal.Width, -Petal.Length)),]
iris[ave(as.numeric(iris$Species), iris$Species, FUN = seq_along) <= 3, ]
#note that I used `as.numeric(iris$Species)` because it's stored as `factor`s and would cause an error otherwise.
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#15 5.8 4.0 1.2 0.2 setosa
#16 5.7 4.4 1.5 0.4 setosa
#19 5.7 3.8 1.7 0.3 setosa
#51 7.0 3.2 4.7 1.4 versicolor
#53 6.9 3.1 4.9 1.5 versicolor
#77 6.8 2.8 4.8 1.4 versicolor
#132 7.9 3.8 6.4 2.0 virginica
#118 7.7 3.8 6.7 2.2 virginica
#136 7.7 3.0 6.1 2.3 virginica