如何在data.frame的给定列中找到一个单元格中包含多个值的最频繁值?
样本数据:
structure(list(time = c("act1_1", "act1_10", "act1_11", "act1_12",
"act1_13", "act1_14", "act1_15", "act1_16", "act1_17", "act1_18",
"act1_19", "act1_2", "act1_20", "act1_21", "act1_22", "act1_23",
"act1_24", "act1_3", "act1_4", "act1_5", "act1_6", "act1_7",
"act1_8", "act1_9"), `Most frequent` = c("110", "310,110,1110",
"310,110,1110", "310,110,111,1110", "110,310,9120,111,1110",
"110,310,111,3110,1110", "9120,110,310,210,111,1110", "9120,110,1110,210,310,111,3110",
"1110,9120,110,310,111,210", "1110,111,110,310,210", "1110,310,110,111,3110,210,9120",
"110", "1110,210,110,310,3110,9120", "1110,110,111,310,210,9120,3110,3210",
"1110,9120,110,3110,310,111,3210,210,3819", "1110,9120,110,111,310,3110,210",
"1110,9120,110,310,210,3110,8210,111", "110", "110", "110,1110",
"110,111,1110", "110,310,1110", "110,1110", "110,210,1110")), row.names = c(NA,
-24L), class = c("tbl_df", "tbl", "data.frame"))
答案 0 :(得分:2)
在拥有table
列strsplit
之后,您可以使用Most frequent
来统计案件。
names(sort(-table(unlist(strsplit(x$"Most frequent", ",")))))[1]
#[1] "110"
答案 1 :(得分:1)
使用here中的Mode
函数:
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
我们可以使用逗号分割字符串,并将未列出的向量传递给Mode
函数以获取最频繁的值。
Mode(unlist(strsplit(df$`Most frequent`, ',')))
#[1] "110"
答案 2 :(得分:1)
使用dplyr:
df %>% separate_rows(`Most frequent`) %>% group_by(`Most frequent`) %>%
summarise(Freq = n()) %>% arrange(desc(Freq)) %>% slice(1)
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 1 x 2
`Most frequent` Freq
<chr> <int>
1 110 24
>
答案 3 :(得分:1)
或者您可以使用嵌套列表:
library(dplyr)
library(tidyr)
library(stringr)
df %>%
dplyr::mutate(X = stringr::str_split(`Most frequent`, ",")) %>%
tidyr::unnest(X) %>%
dplyr::count(X) %>%
dplyr::slice_max(order_by = n)