如何在R中使用循环返回每个类别的最大值?

时间:2020-10-15 19:10:22

标签: r loops for-loop while-loop

表1是我拥有的数据集的示例。如您所见,John Doe的A频率最高,Mark Twain的B频率最高,Ally Mahoney的C频率最高。

Table 1

表2是我想要实现的(只有频率最高的人及其相应分类的表):

Table 2

是否可以使用循环功能来做到这一点?例如。经历每个独特的分类,确定哪个频率最高,并打印相关名称?这在代码中如何工作?

谢谢!

2 个答案:

答案 0 :(得分:1)

R的优点在于,通过应用排序原则和快速排序,您可以轻松地 获得这样的结果。
我通常使用data.table框架,这种框架会明显缩小样板。
下面是此框架中的代码:

install.packages("data.table")
library(data.table)

# Your data
CF <- c(2,1,4,3,16,13,3)
Person <- c("John Doe", "Emily Bronte", "Mark Twain", "Jake Law", "Ally Mahoney", "Ellie Davies", "Bob Knight")
Classification <- c("A", "A", "B", "B", "C", "C", "C")

# The data table
DT <- data.table(Classification, Person, "Classification Frequency" = CF)

# ordering it by classification frequency:
setorderv(DT, "Classification Frequency", -1)

# now, group by classification letter, and in each group, select the first 
# occurrence of Person. This will correspond to the Person with highest 
# classification frequency, as we just sorted:

DT[ , .(Person = Person[1]), keyby = Classification]

#
#   Classification       Person
# 1:              A     John Doe
# 2:              B   Mark Twain
# 3:              C Ally Mahoney

输入数据并进行排序后,它只是一线。

答案 1 :(得分:0)

这是一个带有循环的解决方案:

CF <- c(2,1,4,3,16,13,3)
Person <- c("John Does", "Emily Bronte", "Mark Twain", "Jake Law", "Ally Mahoney", "Ellie Davies", "Bob Knight")
Classification <- c("A", "A", "B", "B", "C", "C", "C")

data <- data.frame(
  CF = CF,
  Person = Person,
  Classification = Classification
)

for (i in unique(data$Classification)) {
  temp <- data[data$Classification %in% i, ]
  index <- which.max(temp$CF)
  print(temp$Person[index])
}

[1] "John Does"
[1] "Mark Twain"
[1] "Ally Mahoney"

但是还有更好的可能性,请参见例如@ user12030145的答案。