表1是我拥有的数据集的示例。如您所见,John Doe的A频率最高,Mark Twain的B频率最高,Ally Mahoney的C频率最高。
表2是我想要实现的(只有频率最高的人及其相应分类的表):
是否可以使用循环功能来做到这一点?例如。经历每个独特的分类,确定哪个频率最高,并打印相关名称?这在代码中如何工作?
谢谢!
答案 0 :(得分:1)
R的优点在于,通过应用排序原则和快速排序,您可以轻松地 获得这样的结果。
我通常使用data.table
框架,这种框架会明显缩小样板。
下面是此框架中的代码:
install.packages("data.table")
library(data.table)
# Your data
CF <- c(2,1,4,3,16,13,3)
Person <- c("John Doe", "Emily Bronte", "Mark Twain", "Jake Law", "Ally Mahoney", "Ellie Davies", "Bob Knight")
Classification <- c("A", "A", "B", "B", "C", "C", "C")
# The data table
DT <- data.table(Classification, Person, "Classification Frequency" = CF)
# ordering it by classification frequency:
setorderv(DT, "Classification Frequency", -1)
# now, group by classification letter, and in each group, select the first
# occurrence of Person. This will correspond to the Person with highest
# classification frequency, as we just sorted:
DT[ , .(Person = Person[1]), keyby = Classification]
#
# Classification Person
# 1: A John Doe
# 2: B Mark Twain
# 3: C Ally Mahoney
输入数据并进行排序后,它只是一线。
答案 1 :(得分:0)
这是一个带有循环的解决方案:
CF <- c(2,1,4,3,16,13,3)
Person <- c("John Does", "Emily Bronte", "Mark Twain", "Jake Law", "Ally Mahoney", "Ellie Davies", "Bob Knight")
Classification <- c("A", "A", "B", "B", "C", "C", "C")
data <- data.frame(
CF = CF,
Person = Person,
Classification = Classification
)
for (i in unique(data$Classification)) {
temp <- data[data$Classification %in% i, ]
index <- which.max(temp$CF)
print(temp$Person[index])
}
[1] "John Does"
[1] "Mark Twain"
[1] "Ally Mahoney"
但是还有更好的可能性,请参见例如@ user12030145的答案。