如何在H2O-R中创建异常检测模型

时间:2017-09-15 15:56:07

标签: r h2o

我正在尝试在R(h2o_3.14.0.2)中运行H2O的异常检测。

首先,我尝试使用我的主要深度学习模型并得到错误:

water.exceptions.H2OIllegalArgumentException
 [1] "water.exceptions.H2OIllegalArgumentException: Only for AutoEncoder Deep Learning model."
 ...
好的,我的坏。我已将autoencoder设置为TRUE

h2o.deeplearning(y = response, training_frame = training.frame, validation_frame = test.frame, autoencoder = TRUE)

又出现了新的错误:

Error in .verify_dataxy(training_frame, x, y, autoencoder): `y` should not be specified for autoencoder=TRUE, remove `y` input
Traceback:

1. h2o.deeplearning(y = response, training_frame = training.frame, 
 .     validation_frame = test.frame, autoencoder = TRUE)
2. .verify_dataxy(training_frame, x, y, autoencoder)
3. stop("`y` should not be specified for autoencoder=TRUE, remove `y` input")

好的,我应该删除y

h2o.deeplearning(training_frame = training.frame, validation_frame = test.frame, autoencoder = TRUE)

可是:

Error in is.numeric(y): argument "y" is missing, with no default
Traceback:

1. h2o.deeplearning(training_frame = training.frame, validation_frame = test.frame, 
 .     autoencoder = TRUE)
2. is.numeric(y)

嗯,最后两个要求看起来互相排斥。但好的,我会尝试另一种模式:

anomaly.detection.model <- h2o.glrm(training_frame = training.frame, k = 10, seed = common.seed)

h2o.anomaly(anomaly.detection.model, training.frame, per_feature = FALSE)

并获得其他类型的错误:

java.lang.AssertionError
 [1] "java.lang.AssertionError"                                                                                    
 [2] "    water.api.ModelMetricsHandler.predict(ModelMetricsHandler.java:439)"
 ...

失败的断言是assert s.reconstruct_train;。还没挖到它。也许我会幸运地使用GBM或RF?

model = h2o.gbm(y = response,
                training_frame = training.frame,
                validation_frame = validation.frame,
                max_hit_ratio_k = 10,
                seed = common.seed,
                stopping_rounds = 3,
                stopping_tolerance = 1e-2)

h2o.anomaly(model, training.frame, per_feature = FALSE)

water.exceptions.H2OIllegalArgumentException
 [1] "water.exceptions.H2OIllegalArgumentException: Requires a Deep Learning, GLRM, DRF or GBM model."

RF也一样。

所以我有两个问题:

  1. 如何检测异常?
  2. 这些是错误还是我做错了什么?

2 个答案:

答案 0 :(得分:1)

我试着检测时间序列数据的异常。要学习我使用此blog的概念。这个博客的解释对我来说很好。

我希望通过对我们检测到异常情况时所发生情况的一些直观表示做出贡献。 在该示例中,深度学习模型适合此ECG数据集。数据看起来像这样:

Data we fit our Deep Learning Model

之后我们提供如下所示的测试数据集(包含异常): Data we test our Deep Learning Model on

当“人工智能”使用公制MSE或均方误差看到差异时,异常检测本身是可能的

This is what AI 'see' on Test dataset

生成的MSE可以如示例

获得

MSE output

答案 1 :(得分:0)

启用自动编码器(为TRUE)会成为群集问题,因此无需设置响应(y)。

当autoencoder设置为TRUE时,您仍然需要设置x。上面使用autoencoder看到的问题是TRUE,你没有预测器(x)设置。一旦你设置了x,你的问题就会消失。

我在R上用H2O 3.14.0.2进行了快速异常检测测试(在此blog中了解更多信息):

  > library(h2o)
  > h2o.init()
  Reading in config file: ./.h2oconfig

  H2O is not running yet, starting it now...

  Note:  In case of errors look at the following log files:
      /var/folders/x7/331tvwcd6p17jj9zdmhnkpyc0000gn/T//Rtmp7RuYKp/h2o_avkashchauhan_started_from_r.out
      /var/folders/x7/331tvwcd6p17jj9zdmhnkpyc0000gn/T//Rtmp7RuYKp/h2o_avkashchauhan_started_from_r.err

  java version "1.8.0_101"
  Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
  Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)

  Starting H2O JVM and connecting: .. Connection successful!

  R is connected to the H2O cluster: 
      H2O cluster uptime:         1 seconds 948 milliseconds 
      H2O cluster version:        3.14.0.2 
      H2O cluster version age:    24 days  
      H2O cluster name:           H2O_started_from_R_avkashchauhan_alj381 
      H2O cluster total nodes:    1 
      H2O cluster total memory:   3.56 GB 
      H2O cluster total cores:    8 
      H2O cluster allowed cores:  8 
      H2O cluster healthy:        TRUE 
      H2O Connection ip:          localhost 
      H2O Connection port:        54321 
      H2O Connection proxy:       NA 
      H2O Internal Security:      FALSE 
      H2O API Extensions:         XGBoost, Algos, AutoML, Core V3, Core V4 
      R Version:                  R version 3.4.0 (2017-04-21) 

  > mtcar = h2o.importFile('https://raw.githubusercontent.com/woobe/H2O_London_Workshop/master/data/auto_design.csv')
    |==================================================================================================================================| 100%
  > mtcar$gear = as.factor(mtcar$gear)
  > mtcar$carb = as.factor(mtcar$carb)
  > mtcar$cyl = as.factor(mtcar$cyl)
  > mtcar$vs = as.factor(mtcar$vs)
  > mtcar$am = as.factor(mtcar$am)
  > mtcar.dl = h2o.deeplearning(x = 2:12, training_frame = mtcar, autoencoder = TRUE, hidden = c(1,1,1), epochs = 100,seed=1)
    |==================================================================================================================================| 100%
  > errors <- h2o.anomaly(mtcar.dl, mtcar, per_feature = TRUE)
  > print(errors)
    reconstr_carb.1.SE reconstr_carb.2.SE reconstr_carb.3.SE reconstr_carb.4.SE reconstr_carb.6.SE reconstr_carb.8.SE
  1                  0                  0                  0                  1                  0                  0
  2                  0                  0                  0                  1                  0                  0
  3                  1                  0                  0                  0                  0                  0
  4                  1                  0                  0                  0                  0                  0
  5                  0                  1                  0                  0                  0                  0
  6                  1                  0                  0                  0                  0                  0
    reconstr_carb.missing(NA).SE reconstr_cyl.4.SE reconstr_cyl.6.SE reconstr_cyl.8.SE reconstr_cyl.10.SE reconstr_cyl.missing(NA).SE
  1                            0                 0                 1                 0                  0                           0
  2                            0                 0                 1                 0                  0                           0
  3                            0                 1                 0                 0                  0                           0
  4                            0                 0                 1                 0                  0                           0
  5                            0                 0                 0                 1                  0                           0
  6                            0                 0                 1                 0                  0                           0
    reconstr_gear.3.SE reconstr_gear.4.SE reconstr_gear.5.SE reconstr_gear.missing(NA).SE reconstr_vs.0.SE reconstr_vs.1.SE
  1                  0                  1                  0                            0                1                0
  2                  0                  1                  0                            0                1                0
  3                  0                  1                  0                            0                0                1
  4                  1                  0                  0                            0                0                1
  5                  1                  0                  0                            0                1                0
  6                  1                  0                  0                            0                0                1
    reconstr_vs.missing(NA).SE reconstr_am.0.SE reconstr_am.1.SE reconstr_am.missing(NA).SE reconstr_mpg.SE reconstr_disp.SE reconstr_hp.SE
  1                          0                0                1                          0    8.705556e-05     0.0196626269   0.0035177471
  2                          0                0                1                          0    8.705556e-05     0.0196626269   0.0035177471
  3                          0                0                1                          0    2.684331e-04     0.0411916382   0.0045768080
  4                          0                1                0                          0    1.307597e-05     0.0004837585   0.0035177471
  5                          0                1                0                          0    1.779785e-03     0.0102131519   0.0007516691
  6                          0                1                0                          0    2.576469e-03     0.0038200199   0.0038147898
    reconstr_drat.SE reconstr_wt.SE reconstr_qsec.SE
  1      0.002147682    0.002080628      0.003914459
  2      0.002147682    0.002054817      0.003843678
  3      0.002153499    0.002111200      0.003646228
  4      0.002244072    0.002020654      0.003545225
  5      0.002235761    0.001998203      0.003843678
  6      0.002282261    0.001996213      0.003451600

  [32 rows x 28 columns]

您也可以在与下面相同的数据集上执行GLRM,您必须设置k并且不需要使用GLRM传递x,但数据集不能具有常量列。这就是我在深度学习中使用过滤数据集和GLRM的原因。

> mtcar_glrm = mtcar[2:12]
> mtcar.glrm = h2o.glrm(training_frame = mtcar_glrm,seed=1, k = 5)