将Dplyr加入和设置操作合并到自定义函数中

时间:2016-11-28 04:12:43

标签: r dplyr

我有两个简单的数据帧。我想使用dplyr和tidyverse来查找" Task2"中的类别。第二个数据帧(Df2)不在"任务"第一个数据帧(Df)。我想用dplyr'" setdiff"这个功能。另外,我想保留#34; Time"第二个数据帧的列(Df2)。

因此,最终产品应包括两排,一排用于"铁衬衫"对于客户"克里斯",总时间为30,一行为客户" Eric","购买杂货",相应的时间为8。

我还想删除日期列。

我认为一种方法是使用dplyr" setdiff"函数(我意识到必须更改Task和Task2列名以使它们匹配)以分离出两行,然后使用join函数重新加入总时间。

最后,我希望这是一个自定义函数,因为我将不得不重复执行此任务。我想要一个像"差异(Df1,Df2)和#34; ...这样的功能,所以我可以输入两个数据帧,并得到结果。

我希望这不要求太多!我是自定义函数的新手,特别是包含dplyr和管道的函数。

希望有人可以帮助我!

CaseWorker<-c("John","John","Kim")

Client<-c("Chris","Chris","Eric")

Task<-c("Feed cat","Make dinner","Do homework")

Date<-c("10/27/2016","09/22/2016","10/11/2016")

Df<-data.frame(CaseWorker,Client,Date,Task)

第二个数据帧......

CaseWorker<-c("John","John","John","John","John","John","John","John","John",
          "John","Kim","Kim","Kim")

Client<-c("Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Eric","Eric","Eric")

Date<-c("11/10/2016","10/10/2016","11/13/2016","09/18/2016","11/11/2016","09/19/2016","08/08/2016","10/10/2016","08/05/2016","11/12/2016","09/09/2016","11/11/2016","09/10/2016")

Task2<-c("Feed cat","Feed cat","Feed cat","Feed cat","Feed cat","Make dinner","Make dinner","Make dinner","Iron shirt","Iron shirt","Do homework",
"Do homework","Buy groceries")

Time<-c(20,34,11,10,5,6,55,30,20,10,12,10,8)

Df2<-data.frame(CaseWorker,Client,Date,Task2,Time)

1 个答案:

答案 0 :(得分:1)

我们可以使用anti_join

library(dplyr)
anti_join(Df2, Df, by = c("Task2"="Task")) %>%
         group_by(CaseWorker,Client, Task2) %>% 
         summarise(Time = sum(Time))
#    CaseWorker Client         Task2  Time
#        <fctr> <fctr>        <fctr> <dbl>
#1       John  Chris    Iron shirt    30
#2        Kim   Eric Buy groceries     8

如果我们需要转换为函数

DiffGoals <- function(dat1, dat2) {
            anti_join(dat1, dat2, by = c("Task2" = "Task")) %>%
                   group_by(CaseWorker, Client, Task2) %>%
                   summarise(Time = sum(Time))
 }

DiffGoals(Df2, Df)