连接每组一列中的值

时间:2018-09-05 14:45:23

标签: r tidyr

我想将每个组中的值串联起来。以下是我要处理的数据框的简短版本。

library(tidyverse)

df <- tibble::tribble(
  ~county,  ~party,
      "A",   "VVD",
      "A",    "GL",
      "A", "Local",
      "B",   "D66",
      "B", "Local"
  )

现在,我想为每个县创建一行,并将所有参与方列在其自己的列中:

df2 <- tibble::tribble(
  ~county, ~party1, ~party2, ~party3,
      "A",   "VVD",    "GL", "Local",
      "B",   "D66", "Local",      NA
  )

以后要与unite()串联并用下划线替换逗号,并删除NA。

df2 %>%
  unite(party, c("party1", "party2", "party3")) %>%
  mutate(party = gsub("_NA", "", party),
         party = gsub("_", ", ", party))

我想要的df输出:

  county party         
  <chr>  <chr>         
1 A      VVD, GL, Local
2 B      D66, Local

2 个答案:

答案 0 :(得分:1)

我们可以通过创建一个序列列和spread

library(tidyverse)
df %>%
   group_by(county) %>% 
   mutate(v1 = paste0('party', row_number())) %>% 
   spread(v1, party)
# A tibble: 2 x 4
# Groups:   county [2]
#  county party1 party2 party3
#  <chr>  <chr>  <chr>  <chr> 
#1 A      VVD    GL     Local 
#2 B      D66    Local  <NA>  

对于第二个输出,我们按'county'和paste分组'party'的元素

df %>%
  group_by(county) %>%
  summarise(party = toString(party))
# A tibble: 2 x 2
#  county party         
#  <chr>  <chr>         
#1 A      VVD, GL, Local
#2 B      D66, Local   

答案 1 :(得分:0)

df %>%
    group_by(county) %>%
    dplyr::summarise( paste0(party, collapse = ", "))

如果不清楚,您应提示?group_by?paste0等。)进入R控制台。