问题陈述
鉴于以下数据集有两列Column1
& Column 2
,再添两个名为Counter
和Counting time
的列。初始化counter
和Counter time
的条件如下:
Column1 > 1
和Column2 = 0
Counting time
必须包含序列发生的时间值(满足条件的数据点序列)具有预期输出的数据框
Column1 Column2 Counter Counter_Time
1.1254 2.784 0 0
4.678 7.985 0 0
8.89 0 0 1
7.65 0 0 1
3.54 0 1 1
4.32 0 2 1
9.83 0 3 1
3.86 4.3 0 1
5.63 9.8 0 1
4.53 0 0 2
6.83 0 0 2
3.431 0 4 2
8.976 0 5 2
9.864 0 6 2
7.3 9.2 0 2
2.3 3.2 0 2
4.3 0 0 3
2.1 0 0 3
4.32 0 7 3
我遇到类似的问题得到了如何增加计数器的答案,但我无法满足上述条件。请注意,计数器应在满足条件的两行之后开始。
从数据集中观察
counter
未初始化但Counter_Time
已递增Counter
已从第5行开始(根据条件第2行,条件满足值不应触发计数器)Counter_Time
保持不变Counter
已开始从第12行开始增加而不考虑第10行和第11行。但Counter_time
在第10行增加我已详细阐述了问题陈述,以便专家明确提供准确的解决方案。
答案 0 :(得分:0)
# Load packages
library(tidyverse)
library(data.table)
# Create example data frame
dt <- fread("Column1 Column2
1.1254 2.784
4.678 7.985
8.89 0
7.65 0
3.54 0
4.32 0
9.83 0
3.86 4.3
5.63 9.8
4.53 0
6.83 0
3.431 0
8.976 0
9.864 0
7.3 9.2
2.3 3.2
4.3 0
2.1 0
4.32 0 ")
### Create Counter_Time
dt2 <- dt %>%
mutate(Merge_ID = 1:n()) %>%
mutate(Condition = ifelse(Column1 > 1 & Column2 == 0, 1, 0)) %>%
mutate(ID = rleid(Condition)) %>%
mutate(Counter_Time = ifelse(Condition == 0, (ID - 1)/2, ID/2))
### Create Counter
dt3 <- dt2 %>%
group_by(Counter_Time) %>%
slice(3:n()) %>%
filter(Condition == 1) %>%
ungroup() %>%
mutate(Counter = 1:n()) %>%
select(Merge_ID, Counter)
### Merge dt2 and dt3 together, dt4 is the final output
dt4 <- dt2 %>%
left_join(dt3, by = "Merge_ID") %>%
mutate(Counter = ifelse(is.na(Counter), 0, Counter)) %>%
select(Column1, Column2, Counter, Counter_Time)
以下代码是创建dt2
后的更新。我们的想法是确保当没有行满足条件时,代码仍会生成Counter
的输出,均等于0。
### Set the index
begin_index <- 3
### Filter the right condition
dt3 <- dt2 %>%
group_by(Counter_Time) %>%
slice(begin_index:n()) %>%
filter(Condition == 1) %>%
ungroup()
### Check if dt3 has any rows
if (nrow(dt3) > 0){
dt3 <- dt3 %>%
mutate(Counter = 1:n()) %>%
select(Merge_ID, Counter)
### Merge dt2 and dt3 together, dt4 is the final output
dt4 <- dt2 %>%
left_join(dt3, by = "Merge_ID") %>%
mutate(Counter = ifelse(is.na(Counter), 0, Counter)) %>%
select(Column1, Column2, Counter, Counter_Time)
### If nrow(dt3) is 0, no rows meet the condition
} else {
### Create Counter column from dt2
dt4 <- dt2 %>%
mutate(Counter = 0) %>%
select(Column1, Column2, Counter, Counter_Time)
}
答案 1 :(得分:0)
使用data.table
的紧凑型解决方案(使用与@ycw相同的数据):
library(data.table)
dt[, counter := 0
][, counter_time := cumsum(c(0,diff(Column1 > 1 & Column2 == 0))==1)
][Column1 > 1 & Column2 == 0, counter := c(0,0,rep(1,(.N-2))), by = counter_time
][counter == 1, counter := cumsum(counter)]
给出:
> dt
Column1 Column2 counter counter_time
1: 1.1254 2.784 0 0
2: 4.6780 7.985 0 0
3: 8.8900 0.000 0 1
4: 7.6500 0.000 0 1
5: 3.5400 0.000 1 1
6: 4.3200 0.000 2 1
7: 9.8300 0.000 3 1
8: 3.8600 4.300 0 1
9: 5.6300 9.800 0 1
10: 4.5300 0.000 0 2
11: 6.8300 0.000 0 2
12: 3.4310 0.000 4 2
13: 8.9760 0.000 5 2
14: 9.8640 0.000 6 2
15: 7.3000 9.200 0 2
16: 2.3000 3.200 0 2
17: 4.3000 0.000 0 3
18: 2.1000 0.000 0 3
19: 4.3200 0.000 7 3
使用过的数据:
library(data.table)
dt <- fread("Column1 Column2
1.1254 2.784
4.678 7.985
8.89 0
7.65 0
3.54 0
4.32 0
9.83 0
3.86 4.3
5.63 9.8
4.53 0
6.83 0
3.431 0
8.976 0
9.864 0
7.3 9.2
2.3 3.2
4.3 0
2.1 0
4.32 0")