根据R中的序列条件按和求和

时间:2018-12-06 08:23:30

标签: r dplyr data.table

说,这是我的数据

     mydat=structure(list(ItemRelation = c(11629L, 11629L, 11629L, 11629L, 
11629L, 11629L, 11629L, 11629L, 11629L, 11629L, 11629L, 11629L, 
11629L, 11629L, 11629L, 11629L, 11629L, 11629L, 11629L, 11629L, 
11629L, 11630L, 11630L, 11630L, 11630L, 11630L, 11630L, 11630L, 
11630L, 11630L, 11630L, 11630L, 11630L), exp_date_days = c(5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L
), CustomerName = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("ТС", "ТС1"), class = "factor"), 
    DocumentNum = c(11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 
    11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 
    11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L
    ), IsPromo = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L), CalendarYear = c(2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L), diff = 1:33), .Names = c("ItemRelation", 
"exp_date_days", "CustomerName", "DocumentNum", "IsPromo", "CalendarYear", 
"diff"), class = "data.frame", row.names = c(NA, -33L))

Ispromo只能订购0-1-0 !!!

我需要为每个ItemRelation+CustomerName+DocumentNum+CalendarYear组根据条件汇总汇总数据。

  1. 如果exp_date_days中的分组的值<= 5,则diff列必须仅按10个零之和求和,这些零在ispromo一类之后。如果零小于10,则以最大零数进行汇总。

  2. 如果exp_date_days组的值> 5,则diff列必须仅按15个零之和求和,这是一类ispromo。如果零小于15,则以最大零个数进行汇总。

因此在此示例中输出

ItemRelation    CustomerName    DocumentNum CalendarYear    diff
11629                  ТС          11               2018    126
11630                  ТС          11               2018     285

如何使用dplyr或data.table做到这一点?

编辑

ItemRelation    exp_date_days   CustomerName    DocumentNum IsPromo CalendarYear    diff
11629   5   ТС  11  0   2018    1
11629   5   ТС  11  0   2018    2
11629   5   ТС  11  0   2018    3
11629   5   ТС  11  0   2018    4
11629   5   ТС  11  0   2018    5
11629   5   ТС  11  0   2018    6
11629   5   ТС  11  0   2018    7
11629   5   ТС  11  0   2018    8
11629   5   ТС  11  0   2018    9
11629   5   ТС  11  0   2018    10
11629   5   ТС  11  0   2018    11
11629   5   ТС  11  0   2018    12
11629   5   ТС  11  1   2018    13
11629   5   ТС  11  1   2018    14
**11629 5   ТС  11  0   2018    15
11629   5   ТС  11  0   2018    16
11629   5   ТС  11  0   2018    17
11629   5   ТС  11  0   2018    18
11629   5   ТС  11  0   2018    19
11629   5   ТС  11  0   2018    20
11629   5   ТС  11  0   2018    21** (sum 126)

edit2

ItemRelation    exp_date_days   CustomerName    DocumentNum IsPromo CalendarYear    diff
11630   6   ТС1 11  0   2018    22
11630   6   ТС1 11  1   2018    23
**11630 6   ТС1 11  0   2018    24
11630   6   ТС1 11  0   2018    25
11630   6   ТС1 11  0   2018    26
11630   6   ТС1 11  0   2018    27
11630   6   ТС1 11  0   2018    28
11630   6   ТС1 11  0   2018    29
11630   6   ТС1 11  0   2018    30
11630   6   ТС1 11  0   2018    31
11630   6   ТС1 11  0   2018    32
11630   6   ТС1 11  0   2018    33** (285)

1 个答案:

答案 0 :(得分:2)

我们可以在filter之后执行group_by,然后获取'diff'列的sum

library(dplyr)
mydat %>% 
  group_by(ItemRelation, CustomerName, DocumentNum, CalendarYear) %>% 
  filter(cumsum(c(FALSE, diff(IsPromo == 1) < 0)) == 1) %>% 
  filter(if(all(exp_date_days < 5)) row_number() <= 10 else row_number() <= 15) %>%
  summarise(diff = sum(diff))
# A tibble: 2 x 5
# Groups:   ItemRelation, CustomerName, DocumentNum [?]
#  ItemRelation CustomerName DocumentNum CalendarYear  diff
#         <int> <fct>              <int>        <int> <int>
#1        11629 ТС                    11         2018   126
#2        11630 ТС1                   11         2018   285