Question

表1是我拥有的数据集的示例。如您所见，John Doe的A频率最高，Mark Twain的B频率最高，Ally Mahoney的C频率最高。

表2是我想要实现的（只有频率最高的人及其相应分类的表）：

是否可以使用循环功能来做到这一点？例如。经历每个独特的分类，确定哪个频率最高，并打印相关名称？这在代码中如何工作？

谢谢！

Answer 1

R的优点在于，通过应用排序原则和快速排序，您可以轻松地获得这样的结果。
我通常使用data.table框架，这种框架会明显缩小样板。
下面是此框架中的代码：

install.packages("data.table")
library(data.table)

# Your data
CF <- c(2,1,4,3,16,13,3)
Person <- c("John Doe", "Emily Bronte", "Mark Twain", "Jake Law", "Ally Mahoney", "Ellie Davies", "Bob Knight")
Classification <- c("A", "A", "B", "B", "C", "C", "C")

# The data table
DT <- data.table(Classification, Person, "Classification Frequency" = CF)

# ordering it by classification frequency:
setorderv(DT, "Classification Frequency", -1)

# now, group by classification letter, and in each group, select the first 
# occurrence of Person. This will correspond to the Person with highest 
# classification frequency, as we just sorted:

DT[ , .(Person = Person[1]), keyby = Classification]

#
#   Classification       Person
# 1:              A     John Doe
# 2:              B   Mark Twain
# 3:              C Ally Mahoney

输入数据并进行排序后，它只是一线。

Answer 2

这是一个带有循环的解决方案：

CF <- c(2,1,4,3,16,13,3)
Person <- c("John Does", "Emily Bronte", "Mark Twain", "Jake Law", "Ally Mahoney", "Ellie Davies", "Bob Knight")
Classification <- c("A", "A", "B", "B", "C", "C", "C")

data <- data.frame(
  CF = CF,
  Person = Person,
  Classification = Classification
)

for (i in unique(data$Classification)) {
  temp <- data[data$Classification %in% i, ]
  index <- which.max(temp$CF)
  print(temp$Person[index])
}

[1] "John Does"
[1] "Mark Twain"
[1] "Ally Mahoney"

但是还有更好的可能性，请参见例如@ user12030145的答案。

如何在R中使用循环返回每个类别的最大值？

2 个答案: