具有条件和组的值的计数

时间:2017-06-23 09:43:49

标签: r data-manipulation

我有一个数据框,按groupIDdate排序:

d1 <- data.frame(groupID = c(1,1,1,1,1,3,3,3,3), 
                 date = c(1,2,3,4,5,6,7,8,9),
                 value = c(1,1,25,1,1,25,1,25,1))

> d1
 groupID date value
       1    1     1
       1    2     1
       1    3    25
       1    4     1
       1    5     1
       3    6    25
       3    7     1
       3    8    25
       3    9     1  

我想创建两个新列:

  1. 每次出现25次,每组的值数= 1
  2. 对于每次出现25次,在每个组的下一个值= 25之前,值= 25之后的值= 1
  3. 期望的输出:

     groupID date value Prev1s After1s
           1    1     1
           1    2     1
           1    3    25      2       2
           1    4     1
           1    5     1
           3    6    25      0       1
           3    7     1
           3    8    25      1       1
           3    9     1
    

    我可以通过创建计数器并获取之前的值来使用Excel。我尝试使用sumshift()在R中实现相同功能但是徒劳无功。

2 个答案:

答案 0 :(得分:1)

您可以使用dplyr ...

执行此操作
library(dplyr)
#first set up some grouping variables based on runs before and after 25s
d1 <- d1 %>% mutate(PrevGp=cumsum(lag(value==25,default = 1)),
                    AfterGp=cumsum(value==25)) %>% 
#use these to calculate the values you want for each group
  group_by(groupID,PrevGp) %>% mutate(Prev1s=sum(value)-25) %>% 
  group_by(groupID,AfterGp) %>% mutate(After1s=sum(value)-25) %>% 
  ungroup() %>% 
#remove values (set to "") other than for value==25
  mutate(Prev1s=replace(Prev1s,value!=25,""),
         After1s=replace(After1s,value!=25,"")) %>% 
#and remove the grouping variables
  select(-c(PrevGp,AfterGp))

d1
# A tibble: 9 x 5
  groupID  date value Prev1s After1s
    <dbl> <dbl> <dbl>  <chr>   <chr>
1       1     1     1               
2       1     2     1               
3       1     3    25      2       2
4       1     4     1               
5       1     5     1               
6       3     6    25      0       1
7       3     7     1               
8       3     8    25      1       1
9       3     9     1               

答案 1 :(得分:0)

data.table - 包与rle - 函数结合使用的替代方法:

library(data.table)
setDT(d1)[, c('prev1s','after1s') := {p <- a <- rle(value);
                                      i <- p$values == 25;
                                      p$values[i] <- shift(p$lengths, fill = 0)[i];
                                      a$values[i] <- shift(a$lengths, type = 'lead', fill = 0)[i];
                                      p$values[!i] <- a$values[!i] <- NA;
                                      list(inverse.rle(p),inverse.rle(a))},
          by = groupID][]

给出:

   groupID date value prev1s after1s
1:       1    1     1     NA      NA
2:       1    2     1     NA      NA
3:       1    3    25      2       2
4:       1    4     1     NA      NA
5:       1    5     1     NA      NA
6:       3    6    25      0       1
7:       3    7     1     NA      NA
8:       3    8    25      1       1
9:       3    9     1     NA      NA