我正在尝试在R(h2o_3.14.0.2)中运行H2O的异常检测。
首先,我尝试使用我的主要深度学习模型并得到错误:
water.exceptions.H2OIllegalArgumentException
[1] "water.exceptions.H2OIllegalArgumentException: Only for AutoEncoder Deep Learning model."
...
好的,我的坏。我已将autoencoder
设置为TRUE
:
h2o.deeplearning(y = response, training_frame = training.frame, validation_frame = test.frame, autoencoder = TRUE)
又出现了新的错误:
Error in .verify_dataxy(training_frame, x, y, autoencoder): `y` should not be specified for autoencoder=TRUE, remove `y` input
Traceback:
1. h2o.deeplearning(y = response, training_frame = training.frame,
. validation_frame = test.frame, autoencoder = TRUE)
2. .verify_dataxy(training_frame, x, y, autoencoder)
3. stop("`y` should not be specified for autoencoder=TRUE, remove `y` input")
好的,我应该删除y
:
h2o.deeplearning(training_frame = training.frame, validation_frame = test.frame, autoencoder = TRUE)
可是:
Error in is.numeric(y): argument "y" is missing, with no default
Traceback:
1. h2o.deeplearning(training_frame = training.frame, validation_frame = test.frame,
. autoencoder = TRUE)
2. is.numeric(y)
嗯,最后两个要求看起来互相排斥。但好的,我会尝试另一种模式:
anomaly.detection.model <- h2o.glrm(training_frame = training.frame, k = 10, seed = common.seed)
h2o.anomaly(anomaly.detection.model, training.frame, per_feature = FALSE)
并获得其他类型的错误:
java.lang.AssertionError
[1] "java.lang.AssertionError"
[2] " water.api.ModelMetricsHandler.predict(ModelMetricsHandler.java:439)"
...
失败的断言是assert s.reconstruct_train;
。还没挖到它。也许我会幸运地使用GBM或RF?
model = h2o.gbm(y = response,
training_frame = training.frame,
validation_frame = validation.frame,
max_hit_ratio_k = 10,
seed = common.seed,
stopping_rounds = 3,
stopping_tolerance = 1e-2)
h2o.anomaly(model, training.frame, per_feature = FALSE)
water.exceptions.H2OIllegalArgumentException
[1] "water.exceptions.H2OIllegalArgumentException: Requires a Deep Learning, GLRM, DRF or GBM model."
RF也一样。
所以我有两个问题:
答案 0 :(得分:1)
我试着检测时间序列数据的异常。要学习我使用此blog的概念。这个博客的解释对我来说很好。
我希望通过对我们检测到异常情况时所发生情况的一些直观表示做出贡献。 在该示例中,深度学习模型适合此ECG数据集。数据看起来像这样:
Data we fit our Deep Learning Model
之后我们提供如下所示的测试数据集(包含异常): Data we test our Deep Learning Model on
当“人工智能”使用公制MSE或均方误差看到差异时,异常检测本身是可能的
This is what AI 'see' on Test dataset
生成的MSE可以如示例
获得答案 1 :(得分:0)
启用自动编码器(为TRUE)会成为群集问题,因此无需设置响应(y)。
当autoencoder设置为TRUE时,您仍然需要设置x。上面使用autoencoder看到的问题是TRUE,你没有预测器(x)设置。一旦你设置了x,你的问题就会消失。
我在R上用H2O 3.14.0.2进行了快速异常检测测试(在此blog中了解更多信息):
> library(h2o)
> h2o.init()
Reading in config file: ./.h2oconfig
H2O is not running yet, starting it now...
Note: In case of errors look at the following log files:
/var/folders/x7/331tvwcd6p17jj9zdmhnkpyc0000gn/T//Rtmp7RuYKp/h2o_avkashchauhan_started_from_r.out
/var/folders/x7/331tvwcd6p17jj9zdmhnkpyc0000gn/T//Rtmp7RuYKp/h2o_avkashchauhan_started_from_r.err
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
Starting H2O JVM and connecting: .. Connection successful!
R is connected to the H2O cluster:
H2O cluster uptime: 1 seconds 948 milliseconds
H2O cluster version: 3.14.0.2
H2O cluster version age: 24 days
H2O cluster name: H2O_started_from_R_avkashchauhan_alj381
H2O cluster total nodes: 1
H2O cluster total memory: 3.56 GB
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
H2O Internal Security: FALSE
H2O API Extensions: XGBoost, Algos, AutoML, Core V3, Core V4
R Version: R version 3.4.0 (2017-04-21)
> mtcar = h2o.importFile('https://raw.githubusercontent.com/woobe/H2O_London_Workshop/master/data/auto_design.csv')
|==================================================================================================================================| 100%
> mtcar$gear = as.factor(mtcar$gear)
> mtcar$carb = as.factor(mtcar$carb)
> mtcar$cyl = as.factor(mtcar$cyl)
> mtcar$vs = as.factor(mtcar$vs)
> mtcar$am = as.factor(mtcar$am)
> mtcar.dl = h2o.deeplearning(x = 2:12, training_frame = mtcar, autoencoder = TRUE, hidden = c(1,1,1), epochs = 100,seed=1)
|==================================================================================================================================| 100%
> errors <- h2o.anomaly(mtcar.dl, mtcar, per_feature = TRUE)
> print(errors)
reconstr_carb.1.SE reconstr_carb.2.SE reconstr_carb.3.SE reconstr_carb.4.SE reconstr_carb.6.SE reconstr_carb.8.SE
1 0 0 0 1 0 0
2 0 0 0 1 0 0
3 1 0 0 0 0 0
4 1 0 0 0 0 0
5 0 1 0 0 0 0
6 1 0 0 0 0 0
reconstr_carb.missing(NA).SE reconstr_cyl.4.SE reconstr_cyl.6.SE reconstr_cyl.8.SE reconstr_cyl.10.SE reconstr_cyl.missing(NA).SE
1 0 0 1 0 0 0
2 0 0 1 0 0 0
3 0 1 0 0 0 0
4 0 0 1 0 0 0
5 0 0 0 1 0 0
6 0 0 1 0 0 0
reconstr_gear.3.SE reconstr_gear.4.SE reconstr_gear.5.SE reconstr_gear.missing(NA).SE reconstr_vs.0.SE reconstr_vs.1.SE
1 0 1 0 0 1 0
2 0 1 0 0 1 0
3 0 1 0 0 0 1
4 1 0 0 0 0 1
5 1 0 0 0 1 0
6 1 0 0 0 0 1
reconstr_vs.missing(NA).SE reconstr_am.0.SE reconstr_am.1.SE reconstr_am.missing(NA).SE reconstr_mpg.SE reconstr_disp.SE reconstr_hp.SE
1 0 0 1 0 8.705556e-05 0.0196626269 0.0035177471
2 0 0 1 0 8.705556e-05 0.0196626269 0.0035177471
3 0 0 1 0 2.684331e-04 0.0411916382 0.0045768080
4 0 1 0 0 1.307597e-05 0.0004837585 0.0035177471
5 0 1 0 0 1.779785e-03 0.0102131519 0.0007516691
6 0 1 0 0 2.576469e-03 0.0038200199 0.0038147898
reconstr_drat.SE reconstr_wt.SE reconstr_qsec.SE
1 0.002147682 0.002080628 0.003914459
2 0.002147682 0.002054817 0.003843678
3 0.002153499 0.002111200 0.003646228
4 0.002244072 0.002020654 0.003545225
5 0.002235761 0.001998203 0.003843678
6 0.002282261 0.001996213 0.003451600
[32 rows x 28 columns]
您也可以在与下面相同的数据集上执行GLRM,您必须设置k并且不需要使用GLRM传递x,但数据集不能具有常量列。这就是我在深度学习中使用过滤数据集和GLRM的原因。
> mtcar_glrm = mtcar[2:12]
> mtcar.glrm = h2o.glrm(training_frame = mtcar_glrm,seed=1, k = 5)