我有三个表:客户(PK:CustomerNum),预订(PK:CustomerNum和TripID NOT PK)和旅行(TRIP)(PK:TripID)。尝试使用purr:reduce。
尝试以下代码。
tables <- list(customer, reservation, trip)
reduce(tables, inner_join, by = c("CustomerNum", "TripID"))
错误:by
不能包含LHS缺少的连接列TripID
答案 0 :(得分:1)
当每个步骤中的for
列不同时,我们可以使用by
循环
grp <- c("CustomerNum", "TripID")
out <- customer
for(i in seq_along(grp)) {
out <- inner_join(out, tables[[i+1]], by = grp[i])
}
out
# CustomerNum val TripID newInfo
#1 1 -0.5458808 4 *
#2 2 0.5365853 2 ****
然后select
感兴趣的列
请注意,如果我们不提供by
,它将根据以下可重现的示例基于公共列名的可用性自动选择by
。由于OP没有提供任何可复制的示例,因此情况尚不清楚
reduce(tables, inner_join)
#Joining, by = "CustomerNum" #### <-----
#Joining, by = "TripID" #### <-----
# CustomerNum val TripID newInfo
#1 1 -0.5458808 4 *
#2 2 0.5365853 2 ****
set.seed(24)
customer <- data.frame(CustomerNum = 1:5, val = rnorm(5))
reservation <- data.frame(CustomerNum = 1:3, TripID = c(4, 2, 8))
trip <- data.frame(TripID = c(4, 9, 2), newInfo = c("*", "**", "****"))
tables <- list(customer, reservation, trip)