添加符合特定条件的计数器

时间:2017-05-26 05:11:44

标签: r row counter

问题陈述

鉴于以下数据集有两列Column1& Column 2,再添两个名为CounterCounting time的列。初始化counterCounter time的条件如下:

  1. 仅当Column1 > 1Column2 = 0
  2. 中的值时,计数器才会递增
  3. 计数器必须在条件满足行
  4. 中的2个值之后开始递增
  5. Counting time必须包含序列发生的时间值(满足条件的数据点序列)
  6. 具有预期输出的数据框

    Column1 Column2 Counter Counter_Time  
    1.1254  2.784    0        0
    4.678   7.985    0        0  
    8.89      0      0        1
    7.65      0      0        1  
    3.54      0      1        1  
    4.32      0      2        1  
    9.83      0      3        1
    3.86     4.3     0        1
    5.63     9.8     0        1
    4.53      0      0        2
    6.83      0      0        2   
    3.431     0      4        2
    8.976     0      5        2
    9.864     0      6        2
    7.3      9.2     0        2
    2.3      3.2     0        2
    4.3       0      0        3
    2.1       0      0        3
    4.32      0      7        3  
    

    我遇到类似的问题得到了如何增加计数器的答案,但我无法满足上述条件。请注意,计数器应在满足条件的两行之后开始。

    从数据集中观察

    1. 第3行符合条件,counter未初始化但Counter_Time已递增
    2. Counter已从第5行开始(根据条件第2行,条件满足值不应触发计数器)
    3. 行号8中的计数器返回0,Counter_Time保持不变
    4. 同样,Counter已开始从第12行开始增加而不考虑第10行和第11行。但Counter_time在第10行增加
    5. 我已详细阐述了问题陈述,以便专家明确提供准确的解决方案。

2 个答案:

答案 0 :(得分:0)

# Load packages
library(tidyverse)
library(data.table)

# Create example data frame
dt <- fread("Column1 Column2
1.1254  2.784
4.678   7.985 
8.89      0
7.65      0  
3.54      0
4.32      0  
9.83      0
3.86     4.3
5.63     9.8
4.53      0
6.83      0  
3.431     0
8.976     0
9.864     0
7.3      9.2
2.3      3.2
4.3       0
2.1       0
4.32      0  ")

### Create Counter_Time
dt2 <- dt %>%
  mutate(Merge_ID = 1:n()) %>%
  mutate(Condition = ifelse(Column1 > 1 & Column2 == 0, 1, 0)) %>%
  mutate(ID = rleid(Condition)) %>%
  mutate(Counter_Time = ifelse(Condition == 0, (ID - 1)/2, ID/2))

### Create Counter
dt3 <- dt2 %>%
  group_by(Counter_Time) %>%
  slice(3:n()) %>%
  filter(Condition == 1) %>%
  ungroup() %>%
  mutate(Counter = 1:n()) %>%
  select(Merge_ID, Counter)

### Merge dt2 and dt3 together, dt4 is the final output
dt4 <- dt2 %>%
  left_join(dt3, by = "Merge_ID") %>%
  mutate(Counter = ifelse(is.na(Counter), 0, Counter)) %>%
  select(Column1, Column2, Counter, Counter_Time)

更新

以下代码是创建dt2后的更新。我们的想法是确保当没有行满足条件时,代码仍会生成Counter的输出,均等于0。

### Set the index
begin_index <- 3

### Filter the right condition
dt3 <- dt2 %>%
  group_by(Counter_Time) %>%
  slice(begin_index:n()) %>%
  filter(Condition == 1) %>%
  ungroup() 


### Check if dt3 has any rows
if (nrow(dt3) > 0){

  dt3 <- dt3 %>%
    mutate(Counter = 1:n()) %>%
    select(Merge_ID, Counter)

  ### Merge dt2 and dt3 together, dt4 is the final output
  dt4 <- dt2 %>%
    left_join(dt3, by = "Merge_ID") %>%
    mutate(Counter = ifelse(is.na(Counter), 0, Counter)) %>%
    select(Column1, Column2, Counter, Counter_Time)

### If nrow(dt3) is 0, no rows meet the condition
} else {

  ### Create Counter column from dt2
  dt4 <- dt2 %>%
    mutate(Counter = 0) %>%
    select(Column1, Column2, Counter, Counter_Time)

}

答案 1 :(得分:0)

使用data.table的紧凑型解决方案(使用与@ycw相同的数据):

library(data.table)
dt[, counter := 0
   ][, counter_time := cumsum(c(0,diff(Column1 > 1 & Column2 == 0))==1)
     ][Column1 > 1 & Column2 == 0, counter := c(0,0,rep(1,(.N-2))), by = counter_time
       ][counter == 1, counter := cumsum(counter)]

给出:

> dt
    Column1 Column2 counter counter_time
 1:  1.1254   2.784       0            0
 2:  4.6780   7.985       0            0
 3:  8.8900   0.000       0            1
 4:  7.6500   0.000       0            1
 5:  3.5400   0.000       1            1
 6:  4.3200   0.000       2            1
 7:  9.8300   0.000       3            1
 8:  3.8600   4.300       0            1
 9:  5.6300   9.800       0            1
10:  4.5300   0.000       0            2
11:  6.8300   0.000       0            2
12:  3.4310   0.000       4            2
13:  8.9760   0.000       5            2
14:  9.8640   0.000       6            2
15:  7.3000   9.200       0            2
16:  2.3000   3.200       0            2
17:  4.3000   0.000       0            3
18:  2.1000   0.000       0            3
19:  4.3200   0.000       7            3

使用过的数据:

library(data.table)
dt <- fread("Column1 Column2
            1.1254  2.784
            4.678   7.985
            8.89      0
            7.65      0
            3.54      0
            4.32      0
            9.83      0
            3.86     4.3
            5.63     9.8
            4.53      0
            6.83      0
            3.431     0
            8.976     0
            9.864     0
            7.3      9.2
            2.3      3.2
            4.3       0
            2.1       0
            4.32      0")