深度学习和深水模型给出了非常不同的logloss(0.4 vs 0.6)

时间:2018-01-16 05:27:20

标签: h2o

在AWS中,我按照here中的说明使用社区AMI ami-97591381(h2o版本:3.13.0.356)启动了g2.2xlarge EC2。

这是我的代码,您可以在我将S3链接公开时运行:

library(h2o)
library(jsonlite)
library(curl)

localH2O = h2o.init()

df.truth <- h2o.importFile("https://s3.amazonaws.com/nw.data.test.us.east/df.truth.zeroed", header = T, sep=",")
df.truth$isFemale <- h2o.asfactor(df.truth$isFemale)
hotnames.truth <- fromJSON("https://s3.amazonaws.com/nw.data.test.us.east/hotnames.json", simplifyVector = T)

# Training and validation sets
splits <- h2o.splitFrame(df.truth, c(0.9), seed=1234)
train.truth <- h2o.assign(splits[[1]], "train.truth.hex")   
valid.truth <- h2o.assign(splits[[2]], "valid.truth.hex")

# Train a model using non-GPU deeplearning
dl.2 <- h2o.deeplearning(         
  training_frame = train.truth, model_id="dl.2",
  validation_frame = valid.truth,      
  x=setdiff(hotnames.truth[1:(length(hotnames.truth)/2)], c("isFemale", "nwtcs")),
  y="isFemale", stopping_metric = "AUTO", seed = 1,
  sparse = F, mini_batch_size = 20)

# Train a model using GPU-enabled deepwater
dw.2 <- h2o.deepwater(         
  training_frame = train.truth, model_id="dw.2", 
  validation_frame = valid.truth,         
  x=setdiff(hotnames.truth[1:(length(hotnames.truth)/2)], c("isFemale", "nwtcs")),
  y="isFemale", stopping_metric = "AUTO", seed = 1,
  sparse = F, mini_batch_size = 20) 

当我检查这两个模型时,令我惊讶的是我在logloss中看到了的差异:

非GPU

print(dl.2)
Model Details:
==============

H2OBinomialModel: deeplearning
Model ID:  dl.2
Status of Neuron Layers: predicting isFemale, 2-class classification, bernoulli distribution, CrossEntropy loss, 160,802 weights/biases, 2.0 MB, 1,041,465 training samples, mini-batch size 1
  layer units      type dropout       l1       l2 mean_rate rate_rms momentum
1     1   600     Input  0.00 %
2     2   200 Rectifier  0.00 % 0.000000 0.000000  0.104435 0.102760 0.000000
3     3   200 Rectifier  0.00 % 0.000000 0.000000  0.031395 0.055490 0.000000
4     4     2   Softmax         0.000000 0.000000  0.001541 0.001438 0.000000
  mean_weight weight_rms mean_bias bias_rms
1
2    0.018904   0.144034  0.150630 0.415525
3   -0.023333   0.081914  0.545394 0.251275
4    0.029091   0.295439 -0.004396 0.357609

H2OBinomialMetrics: deeplearning
** Reported on training data. **
** Metrics reported on temporary training frame with 9877 samples **

MSE:  0.1213733
RMSE:  0.3483868
LogLoss:  0.388214
Mean Per-Class Error:  0.2563669
AUC:  0.8433182
Gini:  0.6866365

Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
          0    1    Error        Rate
0      6546 1079 0.141508  =1079/7625
1       836 1416 0.371226   =836/2252
Totals 7382 2495 0.193885  =1915/9877

H2OBinomialMetrics: deeplearning
** Reported on validation data. **
** Metrics reported on full validation frame **

MSE:  0.126671
RMSE:  0.3559087
LogLoss:  0.4005941
Mean Per-Class Error:  0.2585051
AUC:  0.8309913
Gini:  0.6619825

Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
           0    1    Error         Rate
0      11746 3134 0.210618  =3134/14880
1       1323 2995 0.306392   =1323/4318
Totals 13069 6129 0.232160  =4457/19198

启用GPU-

print(dw.2)
Model Details:
==============

H2OBinomialModel: deepwater
Model ID:  dw.2b
Status of Deep Learning Model: MLP: [200, 200], 630.8 KB, predicting isFemale, 2-class classification, 1,708,160 training samples, mini-batch size 20
  input_neurons     rate momentum
1           600 0.000369 0.900000


H2OBinomialMetrics: deepwater
** Reported on training data. **
** Metrics reported on temporary training frame with 9877 samples **

MSE:  0.1615781
RMSE:  0.4019677
LogLoss:  0.629549
Mean Per-Class Error:  0.3467246
AUC:  0.7289561
Gini:  0.4579122

Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
          0    1    Error        Rate
0      4843 2782 0.364852  =2782/7625
1       740 1512 0.328597   =740/2252
Totals 5583 4294 0.356586  =3522/9877

H2OBinomialMetrics: deepwater
** Reported on validation data. **
** Metrics reported on full validation frame **

MSE:  0.1651776
RMSE:  0.4064205
LogLoss:  0.6901861
Mean Per-Class Error:  0.3476629
AUC:  0.7187362
Gini:  0.4374724

Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
          0    1    Error         Rate
0      8624 6256 0.420430  =6256/14880
1      1187 3131 0.274896   =1187/4318
Totals 9811 9387 0.387697  =7443/19198

如上所示,非GPU和GPU模型之间的logloss差异很大:

Logloss
+----------------------------------+
|                 | non-GPU | GPU  |
+----------------------------------+
| training data   | 0.39    | 0.63 |
+----------------------------------|
| validation data | 0.40    | 0.69 |
+----------------------------------+

据我所知,由于训练的随机性,我会得到不同的结果,但我不会指望非GPU和GPU之间存在如此巨大的差异。

1 个答案:

答案 0 :(得分:1)

h2o.deeplearning是H2O的内置深度学习算法。它很好地并行化,适用于大数据,但不使用GPU。

h2o.deepwater是(可能)Tensorflow的包装器,并且(可能)使用你的GPU(但它可以使用CPU,它可以使用不同的后端)。

换句话说,这与使用CPU或使用GPU没有区别:您正在使用两种不同的深度学习实现。

顺便说一句,我建议你增加epochs的数量(从默认值10增加到200个 - 记住这意味着运行时间要长20倍),看看是否差异仍然存在。或者比较得分历史图表,看看Tensorflow是否到达那里,但只需要50%的纪元来获得相同的logloss得分。