Question

如何在data.frame的给定列中找到一个单元格中包含多个值的最频繁值？

样本数据：

structure(list(time = c("act1_1", "act1_10", "act1_11", "act1_12", 
"act1_13", "act1_14", "act1_15", "act1_16", "act1_17", "act1_18", 
"act1_19", "act1_2", "act1_20", "act1_21", "act1_22", "act1_23", 
"act1_24", "act1_3", "act1_4", "act1_5", "act1_6", "act1_7", 
"act1_8", "act1_9"), `Most frequent` = c("110", "310,110,1110", 
"310,110,1110", "310,110,111,1110", "110,310,9120,111,1110", 
"110,310,111,3110,1110", "9120,110,310,210,111,1110", "9120,110,1110,210,310,111,3110", 
"1110,9120,110,310,111,210", "1110,111,110,310,210", "1110,310,110,111,3110,210,9120", 
"110", "1110,210,110,310,3110,9120", "1110,110,111,310,210,9120,3110,3210", 
"1110,9120,110,3110,310,111,3210,210,3819", "1110,9120,110,111,310,3110,210", 
"1110,9120,110,310,210,3110,8210,111", "110", "110", "110,1110", 
"110,111,1110", "110,310,1110", "110,1110", "110,210,1110")), row.names = c(NA, 
-24L), class = c("tbl_df", "tbl", "data.frame"))

Answer 1

在拥有table列strsplit之后，您可以使用Most frequent来统计案件。

names(sort(-table(unlist(strsplit(x$"Most frequent", ",")))))[1]
#[1] "110"

Answer 2

使用here中的Mode函数：

Mode <- function(x) {
    ux <- unique(x)
    ux[which.max(tabulate(match(x, ux)))]
}

我们可以使用逗号分割字符串，并将未列出的向量传递给Mode函数以获取最频繁的值。

Mode(unlist(strsplit(df$`Most frequent`, ',')))
#[1] "110"

Answer 3

使用dplyr：

df %>% separate_rows(`Most frequent`) %>% group_by(`Most frequent`) %>% 
summarise(Freq = n()) %>% arrange(desc(Freq)) %>% slice(1)
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 1 x 2
  `Most frequent`  Freq
  <chr>           <int>
1 110                24
>

Answer 4

或者您可以使用嵌套列表：

library(dplyr)
library(tidyr)
library(stringr)

df %>% 
  dplyr::mutate(X = stringr::str_split(`Most frequent`, ",")) %>% 
  tidyr::unnest(X) %>%  
  dplyr::count(X) %>% 
  dplyr::slice_max(order_by = n)

给定列中最频繁的值

4 个答案: