如果可能的话,我想使用dplyr和stringr,或者至少保持在Tidyverse中以实现以下目的:
我需要通过CaseWorker和Client对数据进行分组,并比较"任务"和"任务2"找到" Task2"中的所有类别不在"任务"中,以及"任务2"的相关总时间。类别。
"任务"可以有不属于"任务2"的类别,所以我只对在"任务2"中找到类别感兴趣。那些不在"任务"。能够创建新列以显示" Task2"中的特定条目将会很棒。而不是"任务"以及相关的"时间"值。
最终结果应该为客户Chris显示四个新列,一个用于" Iron shirt"以及相关的#34; Time" 45,以及" Do workwork"和#34;时间"客户Eric将有两个新专栏,一个用于" Iron Shirt"和相关时间为12的一个。
CaseWorker<-c("John","John","John","John","John","John","John","John",
"John","Kim","Kim")
Client<-c("Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Eric","Eric")
Task<-c("Feed cat","Feed cat","Feed cat","Make dinner","Make dinner","Make dinner","Buy groceries","Buy groceries","Buy groceries","Do homework","Do homework")
Task2<-c("Feed cat","Iron shirt","Iron shirt","Do homework","Do homework","Do homework","Make dinner","Feed cat","Feed cat","Do homework","Iron shirt")
Time<-c(20,34,11,10,5,6,55,30,20,10,12)
Df<-data.frame(CaseWorker,Client,Task,Task2,Time)
答案 0 :(得分:0)
我们可以尝试
library(dplyr)
library(tidyr)
Df %>%
group_by(CaseWorker, Client) %>%
filter(Task2 %in% setdiff(Task2, Task)) %>%
group_by(Task2, add=TRUE) %>%
summarise(Time = sum(Time)) %>%
spread(Task2, Time)
# CaseWorker Client `Do homework` `Iron shirt`
#* <fctr> <fctr> <dbl> <dbl>
#1 John Chris 21 45
#2 Kim Eric NA 12