第2部分 - 使用Dplyr创建组" group_by"然后使用Stringr" str_detect"找出群体之间的差异

时间:2016-11-15 04:56:58

标签: r dplyr stringr

这是上一个问题的一个更复杂的例子 - Creating Groups with Dplyr's "group_by" then Using Stringr to Find Differences Between Groups

如果可能,我想继续使用dplyrstringr,或者至少留在Tidyverse内。

在这个更复杂的例子中,我再次需要通过CaseWorker和Client对数据进行分组,并比较" Task"和"任务2"找到" Task2"中的所有类别那些不在"任务"。还有一个" Time"柱。

"任务"可以有不属于"任务2"的类别,所以我只对在"任务2"中找到类别感兴趣。那些不在"任务"。能够创建新的列或数据框以显示" Task2"中的特定条目将会很棒。而不是"任务"以及相关的"时间"值。

最终结果应该显示"铁衬衫"和"做作业"对于客户"克里斯"因为这两个类别都不在"任务"中,并且应该显示总数"时间"为每个人。

对于客户" Eric",它应该显示"铁衬衫"和#34;时间" 12.。

CaseWorker<-c("John","John","John","John","John","John","John","John",
"John","Kim","Kim")

Client<-c("Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Eric","Eric")

Task<-c("Feed cat","Feed cat","Feed cat","Make dinner","Make dinner","Make dinner","Buy groceries","Buy groceries","Buy groceries","Do homework","Do homework")

Task2<-c("Feed cat","Iron shirt","Iron shirt","Do Homework","Do homework","Do homework","Make dinner","Feed cat","Feed cat","Do homework","Iron shirt")

Time<-c(20,34,11,10,5,6,55,30,20,10,12)

Df<-data.frame(CaseWorker,Client,Task,Task2,Time)

1 个答案:

答案 0 :(得分:0)

我们在&#39; Task2&#39;中获取元素。那些不在任务&#39;使用setdiffpastetoStringpaste(..., collapse=', ')library(dplyr) Df %>% group_by(CaseWorker, Client) %>% summarise(New = toString(setdiff(Task2, Task))) 的包装)一起使用,然后按“CaseWorker”进行分组。和&#39;客户&#39;

sum

如果我们需要filter时间&#39;任务2&#39;,summarise&#39;任务2&#39;的子集元素的列。在Df %>% group_by(CaseWorker, Client) %>% filter(Task2 %in% setdiff(Task2, Task)) %>% summarise(New= toString(unique(Task2)), Time = sum(Time)) # CaseWorker Client New Time # <fctr> <fctr> <chr> <dbl> #1 John Chris Iron shirt, Do Homework, Do homework 66 #2 Kim Eric Iron shirt 12 步骤之前

unique

任务2&#39;中有一些元素。有案例差异。如果需要进行整理,请转换为较低或较高,将paste元素和New= toString(unique(tolower(Task2)))元素合在一起,即summarise步骤中的UPDATE ordered_set SET -- other fields omitted rank = $7 - 0.5 WHERE ordered_set_id = $1 UPDATE ordered_set b SET rank = t.rank FROM ( SELECT ordered_set_id, row_number() OVER(ORDER BY rank) AS rank FROM ordered_set WHERE ordered_set_parent_id = $1 ) t WHERE b.ordered_set_id = t.ordered_set_id;