根据条件折叠值(不仅仅使用Group By)

时间:2016-12-07 04:32:02

标签: r dplyr

这是以下问题的延续:

Creating binary identifiers based on condition of word combinations for filter

我现在有这个数据集

Case   Date         Item       combiflag   Duration   
1      2016-03-25   Alpha      TRUE        70         
2      2016-03-25   Bravo      TRUE        210
3      2016-03-25   Charlie    FALSE       210
4      2016-03-25   Delta      FALSE       210
5      2016-03-31   Alpha      FALSE       210
6      2016-03-31   Echo       FALSE       210
7      2016-03-31   Falcon     FALSE       210

我的目标是这个输出

Date         Item             Duration   
2016-03-25   Alpha + Bravo    70         
2016-03-25   Charlie          210
2016-03-25   Delta            210
2016-03-31   Alpha            210
2016-03-31   Echo             210
2016-03-31   Falcon           210

这里已经注意到两个变化。首先,只有那些组合了combiflag == TRUE的变量已合并;第二,这里仅采用最短的持续时间。

我尝试过以下代码:

focus <- focus %>% group_by(Date) %>%
    summarise(Item = ifelse(any(combiflag=="TRUE"), paste(Item, collapse = " + "), Item), 
              duration = ifelse(any(combiflag=="TRUE"), min(Duration), Duration))

给出了这个:

Date         Item             Duration   
2016-03-25   Alpha + Bravo    70         
2016-03-31   Alpha            210

以及以下代码:

focus <- focus %>% group_by(Date, combiflag) %>%
    summarise(Item = paste(Item, collapse = " + "), 
              duration = min(Duration))

给出了以下内容:

Date          combiflag    Item                    Duration   
2016-03-25    FALSE        Charlie + Delta         210
2016-03-25    TRUE         Alpha + Bravo           70         
2016-03-31    FALSE        Alpha + Echo + Falcon   210

一切都没有成功。有什么想法吗?

3 个答案:

答案 0 :(得分:3)

一个选项是nest表,以便您可以更改某些嵌套表中的行大小而不影响所有组:

library(tidyverse)

df %>% group_by(Date, combiflag) %>% 
    nest() %>% 
    mutate(data = ifelse(combiflag, 
                         map(data, summarise, 
                             Item = paste(Item, collapse = ' + '), 
                             Duration = min(Duration)), 
                         data)) %>% 
    unnest()

## # A tibble: 6 × 5
##         Date combiflag          Item Duration  Case
##       <fctr>     <lgl>         <chr>    <int> <int>
## 1 2016-03-25      TRUE Alpha + Bravo       70    NA
## 2 2016-03-25     FALSE       Charlie      210     3
## 3 2016-03-25     FALSE         Delta      210     4
## 4 2016-03-31     FALSE         Alpha      210     5
## 5 2016-03-31     FALSE          Echo      210     6
## 6 2016-03-31     FALSE        Falcon      210     7

或者自我加入:

df %>% filter(combiflag) %>% 
    group_by(Date) %>% 
    summarise(combiflag = unique(combiflag),
              Item = paste(Item, collapse = ' + '), 
              Duration = min(Duration)) %>% 
    bind_rows(df %>% filter(!combiflag))

## # A tibble: 6 × 5
##         Date combiflag          Item Duration  Case
##       <fctr>     <lgl>         <chr>    <int> <int>
## 1 2016-03-25      TRUE Alpha + Bravo       70    NA
## 2 2016-03-25     FALSE       Charlie      210     3
## 3 2016-03-25     FALSE         Delta      210     4
## 4 2016-03-31     FALSE         Alpha      210     5
## 5 2016-03-31     FALSE          Echo      210     6
## 6 2016-03-31     FALSE        Falcon      210     7

答案 1 :(得分:2)

根据combiflag子集为2组:即rbind()dt [其中combiflag = FALSE]与dt [其中combiflag为TRUE]

library(data.table)
setDT(dt) # working on data as a data.table
> rbind(dt[combiflag==T,] %>% group_by(Date) %>% 
                              summarise(Item = paste(Item[combiflag], collapse = ","), 
                                        Duration = min(Duration)), 
         dt[combiflag == FALSE][,`:=`(combiflag = NULL,Case = NULL)])[order(Date)]
#         Date        Item Duration
#1: 2016-03-25 Alpha,Bravo       70
#2: 2016-03-25     Charlie      210
#3: 2016-03-25       Delta      210
#4: 2016-03-31       Alpha      210
#5: 2016-03-31        Echo      210
#6: 2016-03-31      Falcon      210

使用data.table方法:

rbind(dt[combiflag == TRUE , .(Item = paste(Item, collapse = "+"), Duration = min(Duration)), by = "Date"],
      dt[combiflag == FALSE, ][,`:=`(combiflag = NULL,Case = NULL)])[order(Date)]
#         Date        Item Duration
#1: 2016-03-25 Alpha+Bravo       70
#2: 2016-03-25     Charlie      210
#3: 2016-03-25       Delta      210
#4: 2016-03-31       Alpha      210
#5: 2016-03-31        Echo      210
#6: 2016-03-31      Falcon      210

答案 2 :(得分:1)

我们可以使用data.table

以紧凑的方式完成此操作
library(data.table)
unique(setDT(df1)[(combiflag), c("Item", "Duration") :=
     .(paste(Item , collapse= " + "), min(Duration)), .( Date)],
          by= names(df1)[-1])[, c("Case", "combiflag") := NULL][]
#          Date          Item Duration
#1: 2016-03-25 Alpha + Bravo       70
#2: 2016-03-25       Charlie      210
#3: 2016-03-25         Delta      210
#4: 2016-03-31         Alpha      210
#5: 2016-03-31          Echo      210
#6: 2016-03-31        Falcon      210