我的数据如下:
df1
#> Artist Album Year
#> 1 Beatles Sgt. Pepper's 1967
#> 2 Rolling Stones Sticky Fingers 1971
和
df2
#> Artist Members
#> 1 Beatles George
#> 2 Beatles Ringo
#> 3 Beatles Paul
#> 4 Beatles John
我想加入这两个df,我认为这是一种“愚蠢”的方式。尽管不怎么整齐,但使最终输出看起来像下面的示例对我来说将是非常有帮助的,在该示例中,每个乐队(艺术家)仅占一行,乐队成员全部放在一列中,并用逗号隔开:
Desired Output
#> Artist Album Members Year
#> 1 Beatles Sgt. Pepper's George, Ringo, Paul, John 1967
#> 2 Rolling Stones Sticky Fingers 1971
我已经能够接近一个解决方案(如下),但是:
library(tidyverse)
df1 <- data.frame(stringsAsFactors=FALSE,
Artist = c("Beatles", "Rolling Stones"),
Album = c("Sgt. Pepper's", "Sticky Fingers"),
Year = c(1967, 1971)
)
df2 <- data.frame(stringsAsFactors=FALSE,
Artist = c("Beatles", "Beatles", "Beatles", "Beatles"),
Members = c("George", "Ringo", "Paul", "John")
)
df <- left_join(df1, df2, by = "Artist")
df <- df %>% group_by(Artist) %>% mutate(member_number = seq_along(Members))
df <- spread(df, key = "member_number", value = "Members", sep = "_")
df <- df %>% unite(col = "members", member_number_1:member_number_4, sep = ",")
哪个给出输出
df
#> # A tibble: 2 x 4
#> # Groups: Artist [2]
#> Artist Album Year members
#> <chr> <chr> <dbl> <chr>
#> 1 Beatles Sgt. Pepper's 1967 George,Ringo,Paul,John
#> 2 Rolling Stones Sticky Fingers 1971 NA,NA,NA,NA
答案 0 :(得分:3)
稍有不同:
library(dplyr)
left_join(df1, df2) %>%
group_by(Artist, Album, Year) %>%
summarise(members = paste(Members, collapse = ","))
# A tibble: 2 x 4
# Groups: Artist, Album [?]
Artist Album Year members
<chr> <chr> <dbl> <chr>
1 Beatles Sgt. Pepper's 1967 George,Ringo,Paul,John
2 Rolling Stones Sticky Fingers 1971 NA
答案 1 :(得分:2)
我们可以先left_join
然后再summarise
多列并将它们折叠为unique
逗号分隔的字符串。
library(dplyr)
left_join(df1, df2, by = "Artist") %>%
group_by(Artist) %>%
summarise_at(vars(Album:Members), ~toString(unique(.)))
# A tibble: 2 x 4
# Artist Album Year Members
# <chr> <chr> <chr> <chr>
#1 Beatles Sgt. Pepper's 1967 George, Ringo, Paul, John
#2 Rolling Stones Sticky Fingers 1971 NA
答案 2 :(得分:2)
使用data.table
library(data.table)
setDT(df2)[df1, on = .(Artist)][, .(members = toString(Members)),
.(Artist, Album, Year)]
# Artist Album Year members
#1: Beatles Sgt. Pepper's 1967 George, Ringo, Paul, John
#2: Rolling Stones Sticky Fingers 1971 NA
答案 3 :(得分:0)
我的软件包 safejoin 允许通过联接变量对联接表进行聚合操作:
# devtools::install_github("moodymudskipper/safejoin")
library(safejoin)
library(dplyr)
df1 %>% eat(df2, .agg = toString)
# Joining, by = "Artist"
# Artist Album Year Members
# 1 Beatles Sgt. Pepper's 1967 George, Ringo, Paul, John
# 2 Rolling Stones Sticky Fingers 1971 <NA>