示例设置:
> first <- function(value) {
if(length(value)==0) {
return(data.frame(out=NA,othercolumns=c(0,1)))
} else {
return(data.frame(out=mean(value),othercolumns=c(1,1)))
}
}
> set.seed(1)
> df <- data.frame(column1=runif(10),column2=runif(10),
category=sample(c("a","b"),10,replace=TRUE))
dplyr函数链返回错误:
> df %>% group_by(category) %>% filter(column2 > 1) %>% do(first(.$column1))
Error: incompatible number of rows (2, expecting 0
有没有办法强制dplyr将空数据帧发送到do()
而不是抛出错误?
更新
按照@ Henrik的链接,似乎需要将数据框转换为tbl_df()
对象。转换必须在group_by()调用之后发生:
> df %>% group_by(category) %>% tbl_df() %>%
filter(column2 > 1) %>% do(first(.$column1))
+ out othercolumns
1 NA 0
2 NA 1
语法奇怪且不直观,但有效......
虽然我希望输出像
category out othercolumns
1 "a" NA 0
2 "a" NA 1
3 "b" NA 0
4 "b" NA 1
更新2
与plyr :: ddply结合似乎运作良好:
> ddply(df,.(category),function(.) filter(.,column2 > 1) %>%
do(first(.$column1)))
+ category out othercolumns
1 a NA 0
2 a NA 1
3 b NA 0
4 b NA 1