在dplyr中按组连接字符串

时间:2018-03-05 09:42:26

标签: r

我在R

中有以下数据框
  Name      Weekday      Block     Count
  ABC_1       1           5B         12
  ABC_1       1           5B         12
  ABC_1       1           5C         10
  ABC_1       1           5B         10
  DER_1       2           5B         10 
  DER_1       2           5C         10 
  DER_1       2           5B         10
  DER_1       2           5C         10

我希望将数据帧作为输出

  Name      Weekday      Block           5B       5C     Cont            
  ABC_1       1           5B,5B,5C,5B    34       10     12,12,10,10
  DER_1       2           5B,5C,5B,5C    20       20     10,10,10,10

我正在使用以下代码来执行此操作。

 df_new<- df %>% 
 group_by(Weekday,Name) %>% 
 mutate(yard_blocks = paste0(Block, collapse = ",")) %>% 
 as.data.frame()

但是,它没有给我想要的输出

1 个答案:

答案 0 :(得分:2)

按名称&#39;,&#39;工作日&#39;和&#39;阻止&#39;进行分组后,将频率作为列(&#39; n&#39;)然后,通过与“姓名”,“工作日”,“我们mutatepaste分组&#39;阻止&#39;阻止&#39;阻止&#39;在新列&#39; Block1&#39;中,获取来自&#39; long&#39;的唯一行(distinct)和spread广泛&#39;

library(dplyr)
library(tidyr)
df %>%
  group_by(Name, Weekday, Block) %>%
  mutate(n = n()) %>%
  group_by(Name, Weekday) %>% 
  mutate(Block1 = toString(Block)) %>%
  distinct %>% 
  spread(Block, n) %>%
  rename(Block = Block1)
# A tibble: 2 x 5
# Groups: Name, Weekday [2]
#    Name  Weekday Block           `5B`  `5C`
#* <chr>   <int> <chr>          <int> <int>
#1 ABC_1       1 5B, 5B, 5C, 5B     3     1
#2 DER_1       2 5B, 5C, 5B, 5C     2     2

更新

基于更新的数据集和问题

df %>%
    group_by(Name, Weekday) %>%
    mutate(Block1 = toString(Block), Cont = toString(Count)) %>% 
    group_by(Block, add = TRUE) %>% 
    mutate(Count = sum(Count)) %>% 
    distinct  %>% 
    spread(Block, Count)
# A tibble: 2 x 6
# Groups: Name, Weekday [2]
#   Name  Weekday Block1         Cont            `5B`  `5C`
#*  <chr>   <int> <chr>          <chr>          <int> <int>
#1  ABC_1       1 5B, 5B, 5C, 5B 12, 12, 10, 10    34    10
#2  DER_1       2 5B, 5C, 5B, 5C 10, 10, 10, 10    20    20