计算R中的重复阵容

时间:2019-04-27 19:34:53

标签: r

我进行了一次模拟,并创建了10,000个阵容。我希望列出创建的阵容数量。例如,这里有5个阵容...

col_1 <- c("Mary", "Jane", "Latoya", "Sandra", "Ebony", "Jada")
col_2 <- c("Jack", "Malik", "Brett", "Demetrius", "Jalen","David")
col_3 <- c("Mary", "Jane", "Latoya", "Sandra", "Ebony", "Jada")
col_4 <- c("Katie", "Emily", "Tara", "Imani", "Molly", "Claire")
col_5 <- c("Mary", "Jane", "Latoya", "Sandra", "Ebony", "Jada")
df <- data.frame(col_1, col_2, col_3,col_4,col_5)

我想要的输出大约是...

阵容A = col_1,col_3,col5 = 3

阵容B = col_2 = 1

阵容C = col_5 = 1

我在调查dplyr包装以寻求解决方案时撞到了墙上。任何帮助,将不胜感激。谢谢。

3 个答案:

答案 0 :(得分:2)

这是tidyverse唯一的解决方案,其中我们排列所有列,折叠,采用唯一值,转置和分组以获取计数。这种方法也使团队成员受益。

library(tidyverse)

df2 <- df %>%
  arrange_all() %>%
  mutate_all(funs(paste0(., collapse = ","))) %>% 
  distinct() %>% 
  t() %>%
  as.data.frame %>%
  mutate(col       = colnames(df)) %>% 
  group_by(team    = V1) %>% 
  summarise(count  = n(), 
            lineup = paste(col, collapse = ","))


print(df2)
# A tibble: 3 x 3
  team                                   count lineup           
  <fct>                                  <int> <chr>            
1 Ebony,Jada,Jane,Latoya,Mary,Sandra         3 col_1,col_3,col_5
2 Jalen,David,Malik,Brett,Jack,Demetrius     1 col_2            
3 Molly,Claire,Emily,Tara,Katie,Imani        1 col_4    

答案 1 :(得分:1)

这将是我的解决方案:

df_t <- df %>% 
  # Transpose the dataset, make sure people are sorted alphabetically
  gather(lineup_number, person_name) %>% # Lineup/Person Level
  arrange(lineup_number, person_name) %>% # Arrange alphabetically
  group_by(lineup_number) %>% 
  mutate(person_order = paste0("person", row_number())) %>%  
  ungroup() %>% 
  spread(person_order, person_name) # Row: Lineup. Column: Person

df_t %>% 
  select(starts_with("person")) %>% 
  group_by_all() %>% 
  summarise(num_lineups = n())

答案 2 :(得分:1)

首先,我们要确保数据框所有列中的级别确实匹配,并去除它们以获得数字。

[root@host etc]# which php
/opt/remi/php73/root/bin/php
[root@host etc]# php -v
PHP 7.3.4 (cli) (built: Apr  2 2019 13:48:50) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.4, Copyright (c) 1998-2018 Zend Technologies
   with the ionCube PHP Loader (enabled) + Intrusion Protection from ioncube24.com  (unconfigured) v10.3.4, Copyright (c) 2002-2019, by ionCube Ltd.

然后,我们可以在列上应用(d2 <- sapply(d, function(x) as.numeric(factor(x, levels=sort(unique(unlist(d))))))) # col_1 col_2 col_3 col_4 col_5 # [1,] 5 10 5 16 5 # [2,] 3 12 3 14 3 # [3,] 4 7 4 18 4 # [4,] 6 9 6 15 6 # [5,] 1 11 1 17 1 # [6,] 2 8 2 13 2 ,将它们分解并在因子级别上进行拆分;我们只想要toString

names

实际上是我们n <- lapply(split(m <- factor(apply(d2, 2, toString)), m), names) 及其rbind的结果。

length

最后,我们可能想给矩阵一些有意义的res <- do.call(rbind, lapply(n, function(x) cbind(toString(x), length(x)))) res # [,1] [,2] # [1,] "col_2" "1" # [2,] "col_4" "1" # [3,] "col_1, col_3, col_5" "3"

dimnames

注意::如果您有超过26个阵容,则可能只想做dimnames(res) <- list(paste("Lineup", LETTERS[1:nrow(res)]), c("col", "n")) res # col n # Lineup A "col_2" "1" # Lineup B "col_4" "1" # Lineup C "col_1, col_3, col_5" "3" 而不是1:nrow(res)