使用purrr函数进行左连接和绑定行

时间:2019-07-01 18:11:19

标签: r dplyr purrr

我构建了一个Web抓取函数,该函数具有各种参数。让我们使用示例参数进行演示。

参数:yeartypegendercol_types

我的函数接受引用的参数并抓取数据以返回df

我希望根据alternatecol_typesstandard,{{的匹配情况,将year type加入gender 1}}。

然后我想将所有行绑定到一个df。

样本数据:

name

我可以使用什么library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union # Sample DF a <- tibble(year = 2019, type = "full_year", col_types = "standard", gender = "M", name = c("a","b","c"), variable_1 = 1:3) b <- tibble(year = 2019, type = "full_year", col_types = "alternate", gender = "M", name = c("a","b","c"), variable_2 = 1:3, variable_3 = 8:10) c <- tibble(year = 2019, type = "full_year", col_types = "standard", gender = "F", name = c("ab","ba","ca"), variable_1 = 4:6) d <- tibble(year = 2019, type = "full_year", col_types = "alternate", gender = "F", name = c("ab","ba","ca"), variable_2 = 1:3, variable_3 = 16:18) e <- tibble(year = 2019, type = "last_month", col_types = "standard", gender = "M", name = c("a","b","c"), variable_1 = 1:3) f <- tibble(year = 2019, type = "last_month", col_types = "alternate", gender = "M", name = c("a","b","c"), variable_2 = 1:3, variable_3 = 8:10) g <- tibble(year = 2019, type = "last_month", col_types = "standard", gender = "F", name = c("ab","ba","ca"), variable_1 = 4:6) h <- tibble(year = 2019, type = "last_month", col_types = "alternate", gender = "F", name = c("ab","ba","ca"), variable_2 = 1:3, variable_3 = 16:18) # I know this is not going to work as it presents me with NA where I want there to be joins df <- bind_rows(a, b, c, d, e, f, g, h) # Adding desired output df <- bind_rows(a, b, c, d, e, f, g, h) m_fy_join <- a %>% left_join(b %>% select(-matches("col_types"))) f_fy_join <- c %>% left_join(d %>% select(-matches("col_types"))) m_lm_join <- e %>% left_join(f %>% select(-matches("col_types"))) f_lm_join <- g %>% left_join(h %>% select(-matches("col_types"))) # Desired Output desired_output <- bind_rows(m_fy_join, f_fy_join, m_lm_join, f_lm_join) 函数来进行left_join,然后绑定行?

2 个答案:

答案 0 :(得分:0)

我认为您不一定需要加入。您可以将所有小标题绑定在一起,并使用合并摆脱NA(这是由于“标准”不具有变量2/3而“替代”不具有变量1) 。

鉴于您当前布置数据的方式,我认为这可能是最简单的。但是,您可能会考虑对流程进行重新设计(如果可能的话),以便在创建时将所有“备用”小节添加到一个列表中,并将所有“标准”小节添加到另一个列表中,因此您可以分别对每个小节进行rbind_list并将两者结合在一起,而不是设计一种方法来管理一堆杂乱无章的小菜。

library(tidyverse)

bind_rows(a, b, c, d, e, f, g, h) %>% 
  group_by(year, type, gender, name) %>% 
  summarise_at(vars(contains('variable')), reduce, coalesce)

# # A tibble: 12 x 7
# # Groups:   year, type, gender [4]
#     year type       gender name  variable_1 variable_2 variable_3
#    <dbl> <chr>      <chr>  <chr>      <int>      <int>      <int>
#  1  2019 full_year  F      ab             4          1         16
#  2  2019 full_year  F      ba             5          2         17
#  3  2019 full_year  F      ca             6          3         18
#  4  2019 full_year  M      a              1          1          8
#  5  2019 full_year  M      b              2          2          9
#  6  2019 full_year  M      c              3          3         10
#  7  2019 last_month F      ab             4          1         16
#  8  2019 last_month F      ba             5          2         17
#  9  2019 last_month F      ca             6          3         18
# 10  2019 last_month M      a              1          1          8
# 11  2019 last_month M      b              2          2          9
# 12  2019 last_month M      c              3          3         10

编辑:感谢您显示所需的输出。我已经检查过了,该输出是等效的,除了顺序以及它没有col_types列的事实之外,

答案 1 :(得分:0)

library(dplyr)
library(purrr)

my_join_function <- function(df1, df2) {
  x <- get(df1)
  y <- get(df2)
  left_join(x, select(y, -matches("col_types")))
}

desired_output2 <- map2_df(
  .x = c("a", "c", "e", "g"), 
  .y = c("b", "d", "f", "h"), 
  .f = my_join_function
)
testthat::expect_error(testthat::expect_identical(desired_output, desired_output2))
  

错误:testthat::expect_identical(desired_output, desired_output2)没有引发错误。