我构建了一个Web抓取函数,该函数具有各种参数。让我们使用示例参数进行演示。
参数:year
,type
,gender
和col_types
。
我的函数接受引用的参数并抓取数据以返回df
。
我希望根据alternate
,col_types
,standard
,{{的匹配情况,将year
type
加入gender
1}}。
然后我想将所有行绑定到一个df。
样本数据:
name
我可以使用什么library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# Sample DF
a <- tibble(year = 2019, type = "full_year", col_types = "standard", gender = "M", name = c("a","b","c"), variable_1 = 1:3)
b <- tibble(year = 2019, type = "full_year", col_types = "alternate", gender = "M", name = c("a","b","c"), variable_2 = 1:3, variable_3 = 8:10)
c <- tibble(year = 2019, type = "full_year", col_types = "standard", gender = "F", name = c("ab","ba","ca"), variable_1 = 4:6)
d <- tibble(year = 2019, type = "full_year", col_types = "alternate", gender = "F", name = c("ab","ba","ca"), variable_2 = 1:3, variable_3 = 16:18)
e <- tibble(year = 2019, type = "last_month", col_types = "standard", gender = "M", name = c("a","b","c"), variable_1 = 1:3)
f <- tibble(year = 2019, type = "last_month", col_types = "alternate", gender = "M", name = c("a","b","c"), variable_2 = 1:3, variable_3 = 8:10)
g <- tibble(year = 2019, type = "last_month", col_types = "standard", gender = "F", name = c("ab","ba","ca"), variable_1 = 4:6)
h <- tibble(year = 2019, type = "last_month", col_types = "alternate", gender = "F", name = c("ab","ba","ca"), variable_2 = 1:3, variable_3 = 16:18)
# I know this is not going to work as it presents me with NA where I want there to be joins
df <- bind_rows(a, b, c, d, e, f, g, h)
# Adding desired output
df <- bind_rows(a, b, c, d, e, f, g, h)
m_fy_join <-
a %>%
left_join(b %>% select(-matches("col_types")))
f_fy_join <-
c %>%
left_join(d %>% select(-matches("col_types")))
m_lm_join <-
e %>%
left_join(f %>% select(-matches("col_types")))
f_lm_join <-
g %>%
left_join(h %>% select(-matches("col_types")))
# Desired Output
desired_output <- bind_rows(m_fy_join, f_fy_join, m_lm_join, f_lm_join)
函数来进行left_join,然后绑定行?
答案 0 :(得分:0)
我认为您不一定需要加入。您可以将所有小标题绑定在一起,并使用合并摆脱NA(这是由于“标准”不具有变量2/3而“替代”不具有变量1) 。
鉴于您当前布置数据的方式,我认为这可能是最简单的。但是,您可能会考虑对流程进行重新设计(如果可能的话),以便在创建时将所有“备用”小节添加到一个列表中,并将所有“标准”小节添加到另一个列表中,因此您可以分别对每个小节进行rbind_list并将两者结合在一起,而不是设计一种方法来管理一堆杂乱无章的小菜。
library(tidyverse)
bind_rows(a, b, c, d, e, f, g, h) %>%
group_by(year, type, gender, name) %>%
summarise_at(vars(contains('variable')), reduce, coalesce)
# # A tibble: 12 x 7
# # Groups: year, type, gender [4]
# year type gender name variable_1 variable_2 variable_3
# <dbl> <chr> <chr> <chr> <int> <int> <int>
# 1 2019 full_year F ab 4 1 16
# 2 2019 full_year F ba 5 2 17
# 3 2019 full_year F ca 6 3 18
# 4 2019 full_year M a 1 1 8
# 5 2019 full_year M b 2 2 9
# 6 2019 full_year M c 3 3 10
# 7 2019 last_month F ab 4 1 16
# 8 2019 last_month F ba 5 2 17
# 9 2019 last_month F ca 6 3 18
# 10 2019 last_month M a 1 1 8
# 11 2019 last_month M b 2 2 9
# 12 2019 last_month M c 3 3 10
编辑:感谢您显示所需的输出。我已经检查过了,该输出是等效的,除了顺序以及它没有col_types
列的事实之外,
答案 1 :(得分:0)
library(dplyr)
library(purrr)
my_join_function <- function(df1, df2) {
x <- get(df1)
y <- get(df2)
left_join(x, select(y, -matches("col_types")))
}
desired_output2 <- map2_df(
.x = c("a", "c", "e", "g"),
.y = c("b", "d", "f", "h"),
.f = my_join_function
)
testthat::expect_error(testthat::expect_identical(desired_output, desired_output2))
错误:
testthat::expect_identical(desired_output, desired_output2)
没有引发错误。