我有两个简单的数据帧。我想使用dplyr和tidyverse来查找" Task2"中的类别。第二个数据帧(Df2)不在"任务"第一个数据帧(Df)。我想用dplyr'" setdiff"这个功能。另外,我想保留#34; Time"第二个数据帧的列(Df2)。
因此,最终产品应包括两排,一排用于"铁衬衫"对于客户"克里斯",总时间为30,一行为客户" Eric","购买杂货",相应的时间为8。
我还想删除日期列。
我认为一种方法是使用dplyr" setdiff"函数(我意识到必须更改Task和Task2列名以使它们匹配)以分离出两行,然后使用join函数重新加入总时间。
最后,我希望这是一个自定义函数,因为我将不得不重复执行此任务。我想要一个像"差异(Df1,Df2)和#34; ...这样的功能,所以我可以输入两个数据帧,并得到结果。
我希望这不要求太多!我是自定义函数的新手,特别是包含dplyr和管道的函数。
希望有人可以帮助我!
CaseWorker<-c("John","John","Kim")
Client<-c("Chris","Chris","Eric")
Task<-c("Feed cat","Make dinner","Do homework")
Date<-c("10/27/2016","09/22/2016","10/11/2016")
Df<-data.frame(CaseWorker,Client,Date,Task)
第二个数据帧......
CaseWorker<-c("John","John","John","John","John","John","John","John","John",
"John","Kim","Kim","Kim")
Client<-c("Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Eric","Eric","Eric")
Date<-c("11/10/2016","10/10/2016","11/13/2016","09/18/2016","11/11/2016","09/19/2016","08/08/2016","10/10/2016","08/05/2016","11/12/2016","09/09/2016","11/11/2016","09/10/2016")
Task2<-c("Feed cat","Feed cat","Feed cat","Feed cat","Feed cat","Make dinner","Make dinner","Make dinner","Iron shirt","Iron shirt","Do homework",
"Do homework","Buy groceries")
Time<-c(20,34,11,10,5,6,55,30,20,10,12,10,8)
Df2<-data.frame(CaseWorker,Client,Date,Task2,Time)
答案 0 :(得分:1)
我们可以使用anti_join
library(dplyr)
anti_join(Df2, Df, by = c("Task2"="Task")) %>%
group_by(CaseWorker,Client, Task2) %>%
summarise(Time = sum(Time))
# CaseWorker Client Task2 Time
# <fctr> <fctr> <fctr> <dbl>
#1 John Chris Iron shirt 30
#2 Kim Eric Buy groceries 8
如果我们需要转换为函数
DiffGoals <- function(dat1, dat2) {
anti_join(dat1, dat2, by = c("Task2" = "Task")) %>%
group_by(CaseWorker, Client, Task2) %>%
summarise(Time = sum(Time))
}
DiffGoals(Df2, Df)