我进行了一次模拟,并创建了10,000个阵容。我希望列出创建的阵容数量。例如,这里有5个阵容...
col_1 <- c("Mary", "Jane", "Latoya", "Sandra", "Ebony", "Jada")
col_2 <- c("Jack", "Malik", "Brett", "Demetrius", "Jalen","David")
col_3 <- c("Mary", "Jane", "Latoya", "Sandra", "Ebony", "Jada")
col_4 <- c("Katie", "Emily", "Tara", "Imani", "Molly", "Claire")
col_5 <- c("Mary", "Jane", "Latoya", "Sandra", "Ebony", "Jada")
df <- data.frame(col_1, col_2, col_3,col_4,col_5)
我想要的输出大约是...
阵容A = col_1,col_3,col5 = 3
阵容B = col_2 = 1
阵容C = col_5 = 1
我在调查dplyr包装以寻求解决方案时撞到了墙上。任何帮助,将不胜感激。谢谢。
答案 0 :(得分:2)
这是tidyverse
唯一的解决方案,其中我们排列所有列,折叠,采用唯一值,转置和分组以获取计数。这种方法也使团队成员受益。
library(tidyverse)
df2 <- df %>%
arrange_all() %>%
mutate_all(funs(paste0(., collapse = ","))) %>%
distinct() %>%
t() %>%
as.data.frame %>%
mutate(col = colnames(df)) %>%
group_by(team = V1) %>%
summarise(count = n(),
lineup = paste(col, collapse = ","))
print(df2)
# A tibble: 3 x 3
team count lineup
<fct> <int> <chr>
1 Ebony,Jada,Jane,Latoya,Mary,Sandra 3 col_1,col_3,col_5
2 Jalen,David,Malik,Brett,Jack,Demetrius 1 col_2
3 Molly,Claire,Emily,Tara,Katie,Imani 1 col_4
答案 1 :(得分:1)
这将是我的解决方案:
df_t <- df %>%
# Transpose the dataset, make sure people are sorted alphabetically
gather(lineup_number, person_name) %>% # Lineup/Person Level
arrange(lineup_number, person_name) %>% # Arrange alphabetically
group_by(lineup_number) %>%
mutate(person_order = paste0("person", row_number())) %>%
ungroup() %>%
spread(person_order, person_name) # Row: Lineup. Column: Person
df_t %>%
select(starts_with("person")) %>%
group_by_all() %>%
summarise(num_lineups = n())
答案 2 :(得分:1)
首先,我们要确保数据框所有列中的级别确实匹配,并去除它们以获得数字。
[root@host etc]# which php
/opt/remi/php73/root/bin/php
[root@host etc]# php -v
PHP 7.3.4 (cli) (built: Apr 2 2019 13:48:50) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.4, Copyright (c) 1998-2018 Zend Technologies
with the ionCube PHP Loader (enabled) + Intrusion Protection from ioncube24.com (unconfigured) v10.3.4, Copyright (c) 2002-2019, by ionCube Ltd.
然后,我们可以在列上应用(d2 <- sapply(d, function(x) as.numeric(factor(x, levels=sort(unique(unlist(d)))))))
# col_1 col_2 col_3 col_4 col_5
# [1,] 5 10 5 16 5
# [2,] 3 12 3 14 3
# [3,] 4 7 4 18 4
# [4,] 6 9 6 15 6
# [5,] 1 11 1 17 1
# [6,] 2 8 2 13 2
,将它们分解并在因子级别上进行拆分;我们只想要toString
,
names
实际上是我们n <- lapply(split(m <- factor(apply(d2, 2, toString)), m), names)
及其rbind
的结果。
length
最后,我们可能想给矩阵一些有意义的res <- do.call(rbind, lapply(n, function(x) cbind(toString(x), length(x))))
res
# [,1] [,2]
# [1,] "col_2" "1"
# [2,] "col_4" "1"
# [3,] "col_1, col_3, col_5" "3"
。
dimnames
注意::如果您有超过26个阵容,则可能只想做dimnames(res) <- list(paste("Lineup", LETTERS[1:nrow(res)]), c("col", "n"))
res
# col n
# Lineup A "col_2" "1"
# Lineup B "col_4" "1"
# Lineup C "col_1, col_3, col_5" "3"
而不是1:nrow(res)
。