首先,感谢您抽出宝贵时间来查看我的问题。
我的问题与我要创建的标志有关: 本质上,标志的工作方式如下: -如果列val的值连续3天大于3,我希望标记原始值。我正在尝试使用ifelse语句,但是,我只是不知道如何在R中编写“连续3天大于3的保持时间”的代码。我已经尝试了多个小时,没有解决方法:/
示例数据可能看起来像这样
Date Vals
8/1/11 2.5
8/2/11 2.6
8/3/11 1.6
8/4/11 3.6
8/5/11 3.5
8/6/11 3.1
8/7/11 3.8
8/8/11 2.1
8/9/11 1.6
8/10/11 3.1
所以从本质上讲:我将得到第三列FLAG,该列仅在8/4/11和8/5/11列上列出错误,而在其余列上没有错误。
请告诉我,谢谢您的时间!
答案 0 :(得分:0)
library(zoo)
df$FLAG <- rollapply(na.fill(df$Vals, 0), 3,
function(x) if(all(x > 3)) 'ERROR'
else 'NO ERROR'
, fill = 'NO ERROR'
, align = 'left')
df
# Date Vals FLAG
# 1: 8/1/11 2.5 NO ERROR
# 2: 8/2/11 2.6 NO ERROR
# 3: 8/3/11 1.6 NO ERROR
# 4: 8/4/11 3.6 ERROR
# 5: 8/5/11 3.5 ERROR
# 6: 8/6/11 3.1 NO ERROR
# 7: 8/7/11 3.8 NO ERROR
# 8: 8/8/11 2.1 NO ERROR
# 9: 8/9/11 1.6 NO ERROR
# 10: 8/10/11 3.1 NO ERROR
使用的数据:
library(data.table)
df <- fread("
Date Vals
8/1/11 2.5
8/2/11 2.6
8/3/11 1.6
8/4/11 3.6
8/5/11 3.5
8/6/11 3.1
8/7/11 3.8
8/8/11 2.1
8/9/11 1.6
8/10/11 3.1
")
答案 1 :(得分:0)
>>> import h2o
>>> h2o.init()
Checking whether there is an H2O instance running at http://localhost:54321. connected.
-------------------------- ----------------------------------------
H2O cluster uptime: 48 mins 24 secs
H2O cluster timezone: America/Chicago
H2O data parsing timezone: UTC
H2O cluster version: 3.20.0.5
H2O cluster version age: 6 days
H2O cluster name: H2O_from_python_user_9znggm
H2O cluster total nodes: 1
H2O cluster free memory: 1.464 Gb
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster status: locked, healthy
H2O connection url: http://localhost:54321
H2O connection proxy:
H2O internal security: False
H2O API Extensions: XGBoost, Algos, AutoML, Core V3, Core V4
Python version: 3.6.5 final
-------------------------- ----------------------------------------
>>>
>>> df = h2o.create_frame(categorical_fraction=0.5)
Create Frame progress: |██████████████████████████████████████████████████████████████████████| 100%
>>>
>>> model = H2OGradientBoostingEstimator()
>>> model.train(x=[c for c in df.columns if c != 'C1'], y='C1', training_frame=df)
gbm Model Build progress: |███████████████████████████████████████████████████████████████████| 100%
>>>
>>> model.varimp(True)
variable relative_importance scaled_importance percentage
0 C3 4448.583984 1.000000 0.255125
1 C9 4424.002930 0.994474 0.253715
2 C6 4273.684082 0.960684 0.245094
3 C4 4249.320312 0.955207 0.243697
4 C10 12.800615 0.002877 0.000734
5 C7 12.022744 0.002703 0.000689
6 C8 8.271964 0.001859 0.000474
7 C2 4.649746 0.001045 0.000267
8 C5 3.567022 0.000802 0.000205
library(dplyr)
df %>% mutate(Flag=ifelse(Vals >=3, 1, 0)) %>% group_by(Flag) %>%
mutate(Flag_final=ifelse(lead(Flag, n=3)==1, 'ERROR', 'No ERROR')) %>% #Check if the 3rd value from the current value is equal to 1 "Using dplyr::lead"
ungroup() %>% select(-Flag) %>%
mutate(Flag_final=ifelse(is.na(Flag_final),'No ERROR',Flag_final))
# A tibble: 10 x 3
Date Vals Flag_final
<fct> <dbl> <chr>
1 8/1/11 2.5 No ERROR
2 8/2/11 2.6 No ERROR
3 8/3/11 1.6 No ERROR
4 8/4/11 3.6 ERROR
5 8/5/11 3.5 ERROR
6 8/6/11 3.1 No ERROR
7 8/7/11 3.8 No ERROR
8 8/8/11 2.1 No ERROR
9 8/9/11 1.6 No ERROR
10 8/10/11 3.1 No ERROR
答案 2 :(得分:0)
您可以使用mutate
和lead
一次拍摄:
library(dplyr)
f1 %>% mutate(FLAG = ifelse(Vals > 3 &
lead(Vals) > 3 &
lead(Vals, 2) > 3)
, 'ERROR', 'NO ERROR'))`
如果您希望最后两个条目始终为“ NO ERROR”(即使它们可能是3天连胜的开始),则可以强制该问题:
library(dplyr)
f1 %>% mutate(FLAG = ifelse(Vals > 3 &
lead(Vals) > 3 &
lead(Vals, 2) > 3 &
!is.na(lead(Vals)) &
!is.na(lead(Vals, 2))
, 'ERROR', 'NO ERROR'))`