为什么有关使用mxnet训练回归网络的这段代码无法收敛?

时间:2019-05-04 10:07:42

标签: deep-learning mxnet

这是代码:

import mxnet
from mxnet import io, gluon, autograd
from mxnet.gluon import nn
from mxnet.gluon.data import ArrayDataset
ctx =  mxnet.gpu() if mxnet.test_utils.list_gpus() else mxnet.cpu()

iter = io.CSVIter(data_csv="data/housing.csv", batch_size=100, data_shape=(10, ))


loss = gluon.loss.L2Loss()
net = nn.Sequential()
net.add(nn.Dense(1))
net.initialize(mxnet.init.Normal(sigma=0.01), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.001})

for (i, iter_data) in enumerate(iter):
    data = iter_data.data[0]
    label_data = data[:, 8]
    train_data = data[:, 3]
    with autograd.record():
        l = loss(net(train_data), label_data)
    l.backward()
    trainer.step(100)
    print(l.mean().asnumpy())

数据是美国的房价,数据如下:

  

-122.23,37.88,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0,近湾   -122.22,37.86,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0,近湾   -122.24,37.85,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0,近湾   -122.25,37.85,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0,近湾   -122.25,37.85,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0,近湾   -122.25,37.85,52.0,919.0,213.0,413.0,193.0,4.0368,269700.0,近湾   -122.25,37.84,52.0,2535.0,489.0,1094.0,514.0,3.6591,299200.0,近湾   -122.25,37.84,52.0,3104.0,687.0,1157.0,647.0,3.12,241400.0,近湾   -122.26,37.84,42.0,2555.0,665.0,1206.0,595.0,2.0804,226700.0,近湾   -122.25,37.84,52.0,3549.0,707.0,1551.0,714.0,3.6912,261100.0,近湾   -122.26,37.85,52.0,2202.0,434.0,910.0,402.0,3.2031,281500.0,近湾   -122.26,37.85,52.0,3503.0,752.0,1504.0,734.0,3.2705,241800.0,近湾

数据来自https://raw.githubusercontent.com/ageron/handson-ml/master/datasets/housing/housing.tgz

结果使我感到困惑:

  

[1.4657609e + 10]   [2.184351e + 17]   [7.357278e + 24]   [1.0737887e + 32]   [楠]   [楠]   ...

那我的代码怎么了?

====================更新=========================== ======================== 使用zscore来规范化特征数组,但并没有帮助(原谅我懒于使用numpy的函数来计算zscore)

import mxnet
import numpy as np
from mxnet import io, gluon, autograd, nd
from mxnet.gluon import nn
from mxnet.gluon.data import ArrayDataset
ctx =  mxnet.gpu() if mxnet.test_utils.list_gpus() else mxnet.cpu()

BATCH_SIZE = 100

iter = io.CSVIter(data_csv="data/housing.csv", batch_size=BATCH_SIZE, data_shape=(10, ))


loss = gluon.loss.L2Loss()
net = nn.Sequential()
net.add(nn.Dense(1))
net.initialize(mxnet.init.Normal(sigma=0.01), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.001})

for (i, iter_data) in enumerate(iter):
    data = iter_data.data[0]
    label_data = data[:, 8]
    train_data = data[:, 3]
    train_data_np = train_data.asnumpy()
    stand = np.std(train_data_np)
    mean = np.mean(train_data_np)
    b = (train_data_np - mean) / stand
    train_data = nd.array(b)
    with autograd.record():
        l = loss(net(train_data), label_data)
    l.backward()
    trainer.step(BATCH_SIZE)
    print(l.mean().asnumpy())

1 个答案:

答案 0 :(得分:0)

您的代码为什么会如此表现可能存在多个问题:简单的模型,缺少功能,非标准化的数据...我建议您看一下MXNet存储库中的房屋预测示例-https://github.com/apache/incubator-mxnet/tree/master/example/gluon/house_prices

D2L在线书的下一章中详细介绍了该代码:http://d2l.ai/chapter_multilayer-perceptrons/kaggle-house-price.html