所以这给了整个列表我需要的东西:
max <- function(x) {
n <- data.frame(x)
factored <- n[sapply(n, is.factor)]
dt_res = data.frame()
for (i in 1:ncol(factored)) {
dt_temp = data.frame(t(table(factored[, i])))
dt_temp$Var1 = names(factored)[i]
dt_res = rbind(dt_res, dt_temp)
}
names(dt_res) = c("Factors", "Categories", "Frequency")
return(dt_res)
}
如何获得每个因素的最大频率?钻石套装,我得到
Factors Categories Frequency
cut Fair 1610
cut Good 4906
cut Very Good 12082
cut Premium 13791
cut Ideal 21551
color D 6775
color E 9797
color F 9542
color G 11292
color H 8304
color I 5422
color J 2808
这个类别为了清晰起见,但我想让它归还:
Factors Categories Frequency
cut Ideal 21551
color G 11292
clarity SI1 13065
由于
答案 0 :(得分:1)
使用dplyr
和tidyr
动词的组合
数据
data <- diamonds
解决方案
library(dplyr)
library(tidyr)
select(data, cut, color, clarity) %>% # dplyr - select relevant columns
gather(key, value) %>% # tidyr - gather into long format
group_by(key) %>% # dplyr - group by column name
count(value) %>% # dplyr - table-like function
top_n(1) # dplyr - filter for top row by group
输出
# A tibble: 3 x 3
# Groups: key [3]
# key value n
# <chr> <chr> <int>
# 1 clarity SI1 13065
# 2 color G 11292
# 3 cut Ideal 21551
编辑选择其他列
要选择其他列,请更改此行select(data, cut, color, clarity)
。例如,select(data, depth, table, price)
要使用diamonds
中的所有列,请将select(data, cut, color, clarity) %>%
替换为data %>%
答案 1 :(得分:1)
如果您想使用它,您需要对功能进行更改。 (毕竟,它是你的解决方案。)
max <- function(x){
[... your code ...]
[... then, between 'names' and 'return' ...]
names(dt_res) = c("Factors", "Categories", "Frequency")
dt_res <- lapply(split(dt_res, dt_res$Factors), function(x) x[which.max(x$Frequency), ])
dt_res <- do.call(rbind, dt_res)
row.names(dt_res) <- NULL
return(dt_res)
}
max(diamonds)
# Factors Categories Frequency
#1 clarity SI1 13065
#2 color G 11292
#3 cut Ideal 21551