Question

这是早期post的一个分支，围绕简化我的功能和消除合并lapply产生的数据帧的讨论。虽然dplyr和data.table等工具消除了合并的需要，但我仍然想知道在这种情况下如何合并。我已经简化了基于此answer生成列表的函数到我之前的问题。

#Reproducible data
Data <- data.frame("custID" = c(1:10, 1:20),
    "v1" = rep(c("A", "B"), c(10,20)), 
    "v2" = c(30:21, 20:19, 1:3, 20:6), stringsAsFactors = TRUE)

#Split-Apply function
res <- lapply(split(Data, Data$v1), function(df) {
    cutoff <- quantile(df$v2, c(0.8, 0.9))
    top_pct <- ifelse(df$v2 > cutoff[2], 10, ifelse(df$v2 > cutoff[1], 20, NA))
    na.omit(data.frame(custID = df$custID, top_pct))
    })

这给了我以下结果：

$A
  custID top_pct
1      1      10
2      2      20

$B
  custID top_pct
1      1      10
2      2      20
6      6      10
7      7      20

我希望结果如下：

  custID A_top_pct B_top_pct
1      1        10        10
2      2        20        20
3      6        NA        10
4      7        NA        20

到达那里最好的方法是什么？我应该做某种重塑吗？如果我这样做，我是否必须首先合并数据框？

这是我的解决方案，可能不是最好的解决方案。（在实际应用中，列表中将有两个以上的数据帧。）

#Change the new variable name
names1 <- names(res)

for(i in 1:length(res)) {
    names(res[[i]])[2] <- paste0(names1[i], "_top_pct")
}

#Merge the results
res_m <- res[[1]]
for(i in 2:length(res)) {
    res_m <- merge(res_m, res[[i]], by = "custID", all = TRUE)
}

Answer 1

您可以使用Reduce

尝试merge

 Reduce(function(...) merge(..., by='custID', all=TRUE), res)
 #     custID top_pct.x top_pct.y
 #1      1        10        10
 #2      2        20        20
 #3      6        NA        10
 #4      7        NA        20

或者正如@Colonel Beauvel建议的那样，一种更具可读性的方法是使用Curry

中的library(functional)来包装它

 library(functional)
 Reduce(Curry(merge, by='custID', all=T), res)

合并列表中的数据框

1 个答案: