这是以下问题的延续:
Creating binary identifiers based on condition of word combinations for filter
我现在有这个数据集
Case Date Item combiflag Duration
1 2016-03-25 Alpha TRUE 70
2 2016-03-25 Bravo TRUE 210
3 2016-03-25 Charlie FALSE 210
4 2016-03-25 Delta FALSE 210
5 2016-03-31 Alpha FALSE 210
6 2016-03-31 Echo FALSE 210
7 2016-03-31 Falcon FALSE 210
我的目标是这个输出
Date Item Duration
2016-03-25 Alpha + Bravo 70
2016-03-25 Charlie 210
2016-03-25 Delta 210
2016-03-31 Alpha 210
2016-03-31 Echo 210
2016-03-31 Falcon 210
这里已经注意到两个变化。首先,只有那些组合了combiflag == TRUE的变量已合并;第二,这里仅采用最短的持续时间。
我尝试过以下代码:
focus <- focus %>% group_by(Date) %>%
summarise(Item = ifelse(any(combiflag=="TRUE"), paste(Item, collapse = " + "), Item),
duration = ifelse(any(combiflag=="TRUE"), min(Duration), Duration))
给出了这个:
Date Item Duration
2016-03-25 Alpha + Bravo 70
2016-03-31 Alpha 210
以及以下代码:
focus <- focus %>% group_by(Date, combiflag) %>%
summarise(Item = paste(Item, collapse = " + "),
duration = min(Duration))
给出了以下内容:
Date combiflag Item Duration
2016-03-25 FALSE Charlie + Delta 210
2016-03-25 TRUE Alpha + Bravo 70
2016-03-31 FALSE Alpha + Echo + Falcon 210
一切都没有成功。有什么想法吗?
答案 0 :(得分:3)
一个选项是nest
表,以便您可以更改某些嵌套表中的行大小而不影响所有组:
library(tidyverse)
df %>% group_by(Date, combiflag) %>%
nest() %>%
mutate(data = ifelse(combiflag,
map(data, summarise,
Item = paste(Item, collapse = ' + '),
Duration = min(Duration)),
data)) %>%
unnest()
## # A tibble: 6 × 5
## Date combiflag Item Duration Case
## <fctr> <lgl> <chr> <int> <int>
## 1 2016-03-25 TRUE Alpha + Bravo 70 NA
## 2 2016-03-25 FALSE Charlie 210 3
## 3 2016-03-25 FALSE Delta 210 4
## 4 2016-03-31 FALSE Alpha 210 5
## 5 2016-03-31 FALSE Echo 210 6
## 6 2016-03-31 FALSE Falcon 210 7
或者自我加入:
df %>% filter(combiflag) %>%
group_by(Date) %>%
summarise(combiflag = unique(combiflag),
Item = paste(Item, collapse = ' + '),
Duration = min(Duration)) %>%
bind_rows(df %>% filter(!combiflag))
## # A tibble: 6 × 5
## Date combiflag Item Duration Case
## <fctr> <lgl> <chr> <int> <int>
## 1 2016-03-25 TRUE Alpha + Bravo 70 NA
## 2 2016-03-25 FALSE Charlie 210 3
## 3 2016-03-25 FALSE Delta 210 4
## 4 2016-03-31 FALSE Alpha 210 5
## 5 2016-03-31 FALSE Echo 210 6
## 6 2016-03-31 FALSE Falcon 210 7
答案 1 :(得分:2)
根据combiflag子集为2组:即rbind()dt [其中combiflag = FALSE]与dt [其中combiflag为TRUE]
library(data.table)
setDT(dt) # working on data as a data.table
> rbind(dt[combiflag==T,] %>% group_by(Date) %>%
summarise(Item = paste(Item[combiflag], collapse = ","),
Duration = min(Duration)),
dt[combiflag == FALSE][,`:=`(combiflag = NULL,Case = NULL)])[order(Date)]
# Date Item Duration
#1: 2016-03-25 Alpha,Bravo 70
#2: 2016-03-25 Charlie 210
#3: 2016-03-25 Delta 210
#4: 2016-03-31 Alpha 210
#5: 2016-03-31 Echo 210
#6: 2016-03-31 Falcon 210
使用data.table
方法:
rbind(dt[combiflag == TRUE , .(Item = paste(Item, collapse = "+"), Duration = min(Duration)), by = "Date"],
dt[combiflag == FALSE, ][,`:=`(combiflag = NULL,Case = NULL)])[order(Date)]
# Date Item Duration
#1: 2016-03-25 Alpha+Bravo 70
#2: 2016-03-25 Charlie 210
#3: 2016-03-25 Delta 210
#4: 2016-03-31 Alpha 210
#5: 2016-03-31 Echo 210
#6: 2016-03-31 Falcon 210
答案 2 :(得分:1)
我们可以使用data.table
library(data.table)
unique(setDT(df1)[(combiflag), c("Item", "Duration") :=
.(paste(Item , collapse= " + "), min(Duration)), .( Date)],
by= names(df1)[-1])[, c("Case", "combiflag") := NULL][]
# Date Item Duration
#1: 2016-03-25 Alpha + Bravo 70
#2: 2016-03-25 Charlie 210
#3: 2016-03-25 Delta 210
#4: 2016-03-31 Alpha 210
#5: 2016-03-31 Echo 210
#6: 2016-03-31 Falcon 210