仅为每个因素

时间:2017-10-21 12:07:41

标签: r for-loop dataframe

所以这给了整个列表我需要的东西:

max <- function(x) {
    n <- data.frame(x)
    factored <- n[sapply(n, is.factor)]
    dt_res = data.frame()

    for (i in 1:ncol(factored)) {

        dt_temp = data.frame(t(table(factored[, i])))
        dt_temp$Var1 = names(factored)[i]
        dt_res = rbind(dt_res, dt_temp)

    }

    names(dt_res) = c("Factors", "Categories", "Frequency")

    return(dt_res)
}

如何获得每个因素的最大频率?钻石套装,我得到

Factors Categories Frequency
cut       Fair      1610 
cut       Good      4906 
cut    Very Good    12082 
cut      Premium    13791 
cut      Ideal      21551 
color      D         6775 
color      E         9797 
color      F         9542 
color      G        11292 
color      H        8304 
color      I        5422 
color      J        2808 

这个类别为了清晰起见,但我想让它归还:

Factors Categories Frequency
cut      Ideal      21551 
color      G        11292 
clarity    SI1      13065 

由于

2 个答案:

答案 0 :(得分:1)

使用dplyrtidyr动词的组合

数据

data <- diamonds

解决方案

library(dplyr)
library(tidyr)
select(data, cut, color, clarity) %>%   # dplyr - select relevant columns
  gather(key, value) %>%                # tidyr - gather into long format
  group_by(key) %>%                     # dplyr - group by column name
  count(value) %>%                      # dplyr - table-like function
  top_n(1)                              # dplyr - filter for top row by group

输出

# A tibble: 3 x 3
# Groups:   key [3]
      # key value     n
    # <chr> <chr> <int>
# 1 clarity   SI1 13065
# 2   color     G 11292
# 3     cut Ideal 21551

编辑选择其他列

要选择其他列,请更改此行select(data, cut, color, clarity)。例如,select(data, depth, table, price)

要使用diamonds中的所有列,请将select(data, cut, color, clarity) %>%替换为data %>%

答案 1 :(得分:1)

如果您想使用它,您需要对功能进行更改。 (毕竟,它是你的解决方案。)

max <- function(x){

    [... your code ...]
    [... then, between 'names' and 'return' ...]

    names(dt_res) = c("Factors", "Categories", "Frequency")
    dt_res <- lapply(split(dt_res, dt_res$Factors), function(x) x[which.max(x$Frequency), ])
    dt_res <- do.call(rbind, dt_res)
    row.names(dt_res) <- NULL
    return(dt_res)
}

max(diamonds)
#  Factors Categories Frequency
#1 clarity        SI1     13065
#2   color          G     11292
#3     cut      Ideal     21551