如何创建一个循环,以检查值是否大于AND并在一定天数内保持大于某个数字?

时间:2018-08-15 16:11:53

标签: r if-statement dplyr

首先,感谢您抽出宝贵时间来查看我的问题。

我的问题与我要创建的标志有关: 本质上,标志的工作方式如下: -如果列val的值连续3天大于3,我希望标记原始值。我正在尝试使用ifelse语句,但是,我只是不知道如何在R中编写“连续3天大于3的保持时间”的代码。我已经尝试了多个小时,没有解决方法:/

示例数据可能看起来像这样

Date      Vals
8/1/11     2.5
8/2/11     2.6
8/3/11     1.6
8/4/11     3.6
8/5/11     3.5
8/6/11     3.1
8/7/11     3.8
8/8/11     2.1
8/9/11     1.6
8/10/11    3.1

所以从本质上讲:我将得到第三列FLAG,该列仅在8/4/11和8/5/11列上列出错误,而在其余列上没有错误。

请告诉我,谢谢您的时间!

3 个答案:

答案 0 :(得分:0)

library(zoo)
df$FLAG <- rollapply(na.fill(df$Vals, 0), 3, 
                     function(x) if(all(x > 3)) 'ERROR' 
                                 else 'NO ERROR'
                     , fill = 'NO ERROR'
                     , align = 'left')

df
#        Date Vals     FLAG
#  1:  8/1/11  2.5 NO ERROR
#  2:  8/2/11  2.6 NO ERROR
#  3:  8/3/11  1.6 NO ERROR
#  4:  8/4/11  3.6    ERROR
#  5:  8/5/11  3.5    ERROR
#  6:  8/6/11  3.1 NO ERROR
#  7:  8/7/11  3.8 NO ERROR
#  8:  8/8/11  2.1 NO ERROR
#  9:  8/9/11  1.6 NO ERROR
# 10: 8/10/11  3.1 NO ERROR

使用的数据:

library(data.table)
df <- fread("
Date      Vals
8/1/11     2.5
8/2/11     2.6
8/3/11     1.6
8/4/11     3.6
8/5/11     3.5
8/6/11     3.1
8/7/11     3.8
8/8/11     2.1
8/9/11     1.6
8/10/11    3.1
")

答案 1 :(得分:0)

>>> import h2o
>>> h2o.init()
Checking whether there is an H2O instance running at http://localhost:54321. connected.
--------------------------  ----------------------------------------
H2O cluster uptime:         48 mins 24 secs
H2O cluster timezone:       America/Chicago
H2O data parsing timezone:  UTC
H2O cluster version:        3.20.0.5
H2O cluster version age:    6 days
H2O cluster name:           H2O_from_python_user_9znggm
H2O cluster total nodes:    1
H2O cluster free memory:    1.464 Gb
H2O cluster total cores:    8
H2O cluster allowed cores:  8
H2O cluster status:         locked, healthy
H2O connection url:         http://localhost:54321
H2O connection proxy:
H2O internal security:      False
H2O API Extensions:         XGBoost, Algos, AutoML, Core V3, Core V4
Python version:             3.6.5 final
--------------------------  ----------------------------------------
>>>
>>> df = h2o.create_frame(categorical_fraction=0.5)
Create Frame progress: |██████████████████████████████████████████████████████████████████████| 100%
>>>
>>> model = H2OGradientBoostingEstimator()
>>> model.train(x=[c for c in df.columns if c != 'C1'], y='C1', training_frame=df)
gbm Model Build progress: |███████████████████████████████████████████████████████████████████| 100%
>>>
>>> model.varimp(True)
  variable  relative_importance  scaled_importance  percentage
0       C3          4448.583984           1.000000    0.255125
1       C9          4424.002930           0.994474    0.253715
2       C6          4273.684082           0.960684    0.245094
3       C4          4249.320312           0.955207    0.243697
4      C10            12.800615           0.002877    0.000734
5       C7            12.022744           0.002703    0.000689
6       C8             8.271964           0.001859    0.000474
7       C2             4.649746           0.001045    0.000267
8       C5             3.567022           0.000802    0.000205

数据

library(dplyr)
df %>% mutate(Flag=ifelse(Vals >=3, 1, 0)) %>% group_by(Flag) %>%
       mutate(Flag_final=ifelse(lead(Flag, n=3)==1, 'ERROR', 'No ERROR')) %>% #Check if the 3rd value from the current value is equal to 1 "Using dplyr::lead" 
       ungroup() %>% select(-Flag) %>%  
       mutate(Flag_final=ifelse(is.na(Flag_final),'No ERROR',Flag_final)) 


# A tibble: 10 x 3
  Date     Vals Flag_final
  <fct>   <dbl> <chr>     
  1 8/1/11    2.5 No ERROR  
  2 8/2/11    2.6 No ERROR  
  3 8/3/11    1.6 No ERROR  
  4 8/4/11    3.6 ERROR     
  5 8/5/11    3.5 ERROR     
  6 8/6/11    3.1 No ERROR  
  7 8/7/11    3.8 No ERROR  
  8 8/8/11    2.1 No ERROR  
  9 8/9/11    1.6 No ERROR  
 10 8/10/11   3.1 No ERROR 

答案 2 :(得分:0)

您可以使用mutatelead一次拍摄:

library(dplyr)
f1 %>% mutate(FLAG = ifelse(Vals > 3 &
                            lead(Vals) > 3 &
                            lead(Vals, 2) > 3)
                            , 'ERROR', 'NO ERROR'))`

如果您希望最后两个条目始终为“ NO ERROR”(即使它们可能是3天连胜的开始),则可以强制该问题:

library(dplyr)
f1 %>% mutate(FLAG = ifelse(Vals > 3 &
                            lead(Vals) > 3 &
                            lead(Vals, 2) > 3 &
                            !is.na(lead(Vals)) &
                            !is.na(lead(Vals, 2))
                            , 'ERROR', 'NO ERROR'))`