我想使用R&#39 dplyr
和data.table
来计算同一列(Temperature
),和中连续出现的次数标记连续出现> 3"丢弃"。
答案 0 :(得分:4)
使用data.table
:
library(data.table)
setDT(df)
df[,
Comment := ifelse(seq_len(.N) <= 3, 'OK', 'Discard'),
.(Store, RTU, rleid(Temperature))
][]
# Time Store RTU Temperature Comment
# 1: 1 1000 1 54 OK
# 2: 2 1000 1 54 OK
# 3: 3 1000 1 54 OK
# 4: 4 1000 1 54 Discard
# 5: 5 1000 1 54 Discard
# 6: 6 1000 1 56 OK
# 7: 7 1000 1 57 OK
# 8: 8 1000 1 50 OK
# 9: 9 1000 1 50 OK
#10: 10 1000 1 50 OK
#11: 11 1000 1 50 Discard
#12: 12 1000 1 50 Discard
#13: 13 1000 1 61 OK
#14: 14 1000 1 61 OK
#15: 15 1000 1 61 OK
#16: 16 1000 1 61 Discard
#17: 17 1000 1 61 Discard
#18: 18 1000 1 58 OK
答案 1 :(得分:1)
扩展OP解决方案以使用dplyr
和data.table
选项可以如下:
library(dplyr)
library(data.table)
df %>% group_by(Store,RTU) %>% mutate(Flag = rleid(Temperature)) %>%
group_by(Flag) %>%
mutate(Flag_Temperature_check = ifelse(row_number() <= 3, "Ok","Discard"))
# # A tibble: 18 x 6
# # Groups: Flag [6]
# Time Store RTU Temperature Flag Flag_Temperature_check
# <int> <int> <int> <int> <int> <chr>
# 1 1 1000 1 54 1 Ok
# 2 2 1000 1 54 1 Ok
# 3 3 1000 1 54 1 Ok
# 4 4 1000 1 54 1 Discard
# 5 5 1000 1 54 1 Discard
# 6 6 1000 1 56 2 Ok
# 7 7 1000 1 57 3 Ok
# 8 8 1000 1 50 4 Ok
# 9 9 1000 1 50 4 Ok
# 10 10 1000 1 50 4 Ok
# 11 11 1000 1 50 4 Discard
# 12 12 1000 1 50 4 Discard
# 13 13 1000 1 61 5 Ok
# 14 14 1000 1 61 5 Ok
# 15 15 1000 1 61 5 Ok
# 16 16 1000 1 61 5 Discard
# 17 17 1000 1 61 5 Discard
# 18 18 1000 1 58 6 Ok