我想找到某个数据帧中出现次数最多的数据帧的列。例如,以下数据框:
as.data.frame(cbind(c(1,4,6,9,20),c(2,4,7,7,3),c(4,7,6,4,2),c(1,4,6,9,20),c(4,7,6,4,2),c(7,4,6,4,2)))
我的第一个想法是对数据框中的列进行排序,这样我得到:
as.data.frame(cbind(c(1,4,6,9,20),c(2,3,4,7,7),c(2,4,4,6,7),c(1,4,6,9,20),c(2,4,4,6,7),c(2,4,4,6,7)))
然后在此数据框中找到出现次数最多的列。然后它将返回c(2,4,4,6,7)
。如何在R中完成此操作?
答案 0 :(得分:2)
基本上,您可以paste
并将数字计数在table
中,然后选择which.max
。
d1[, which.max(table(sapply(d1, paste, collapse="")))]
# [1] 2 4 4 6 7
数据
d1 <- structure(list(X1 = c(1, 4, 6, 9, 20), X2 = c(2, 3, 4, 7, 7),
X3 = c(2, 4, 4, 6, 7), X4 = c(1, 4, 6, 9, 20), X5 = c(2,
4, 4, 6, 7), X6 = c(2, 4, 4, 6, 7)), class = "data.frame", row.names = c(NA, -5L))
答案 1 :(得分:2)
与@ jay.sf基本相同的解决方案,但使用tidyverse:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
dataset <- as.data.frame(x = cbind(c(1, 4, 6, 9, 20),
c(2, 4, 7, 7, 3),
c(4, 7, 6, 4, 2),
c(1, 4, 6, 9, 20),
c(4, 7, 6, 4, 2),
c(7, 4, 6, 4, 2)))
dataset %>%
summarise_all(.funs = ~ paste0(sort(.), collapse = "")) %>%
as.numeric() %>%
table() %>%
which.max() %>%
`[`(dataset, .)
#> V2
#> 1 2
#> 2 4
#> 3 7
#> 4 7
#> 5 3
由reprex package(v0.3.0)于2019-06-15创建
答案 2 :(得分:1)
如果要匹配多个列:
# Creating a table of pasted & sorted column values
counts_df1 <- table(do.call(paste, data.frame(t(sapply(df1, sort)))))
# If you want the sorted order returned as a single element vector:
names(counts_df1[counts_df1 == max(counts_df1)])
[1] "2 4 4 6 7"
或者,您可以这样做来索引数据框中的列:
# Creating collapsed strings from columns
df1_vec <- sapply(df1, function(x) paste0(sort(x), collapse = ""))
# Counting the frequency of each collapsed strings
df1_colsum <- colSums(outer(df1_vec, df1_vec, `==`))
# Subsetting the dataframe based on the most frequent columns that are not duplicates
df1[, df1_colsum == max(df1_colsum) & !duplicated(df1_vec)]
[1] 4 7 6 4 2