Tensorflow:train_and_evaluate总是产生损失0

时间:2018-11-14 00:41:18

标签: python tensorflow deep-learning tensorflow-estimator

我有一个项目在Tensorflow中使用固定估计器,并尝试产生train_and_evaluate方法。

estimator = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    hidden_units=hidden_units,
    model_dir=model_dir,
    optimizer=tf.train.ProximalAdagradOptimizer(learning_rate=0.01,l1_regularization_strength=0.001))

每次我看到控制台输出时,它都表明损失始终为零。

INFO:tensorflow:loss = 2.896826e-06, step = 875
INFO:tensorflow:global_step/sec: 5.96785
INFO:tensorflow:loss = 1.9453131e-05, step = 975 (16.756 sec)
INFO:tensorflow:global_step/sec: 7.2834
INFO:tensorflow:loss = 8.6957414e-05, step = 1075 (13.730 sec)
INFO:tensorflow:global_step/sec: 7.36042
INFO:tensorflow:loss = 0.0004585028, step = 1175 (13.586 sec)
INFO:tensorflow:global_step/sec: 7.38419
INFO:tensorflow:loss = 0.0012249642, step = 1275 (13.542 sec)
INFO:tensorflow:global_step/sec: 7.3658
INFO:tensorflow:loss = 0.002194246, step = 1375 (13.576 sec)
INFO:tensorflow:global_step/sec: 7.33054
INFO:tensorflow:loss = 0.0031063582, step = 1475 (13.641 sec)

发生这种情况是因为我更改了input_fn(我曾经将CSV加载到pandas Dataframe中并从那里工作,但是我的数据集总数超过10GB(尺寸为800x1500000),并且每次我用来保存模型时,模型文件夹的大小曾经非常疯狂(超过200GB),所以我决定改用迭代器(我在某个地方的教程中找到了此输入函数,并且效果很好):

def input_fn_train(filenames,
                        num_epochs=None,
                        shuffle=True,
                        skip_header_lines=0,
                        batch_size=200,
                        modeTrainEval=True):
    filename_dataset = tf.data.Dataset.from_tensor_slices(filenames)
    if shuffle:
        filename_dataset = filename_dataset.shuffle(len(filenames))
    dataset = filename_dataset.flat_map(lambda filename: tf.data.TextLineDataset(filename).skip(skip_header_lines))
    dataset = dataset.map(parse_csv)
    if shuffle:
        dataset = dataset.shuffle(buffer_size=batch_size * 10)
    dataset = dataset.repeat(num_epochs)
    dataset = dataset.batch(batch_size)
    iterator = dataset.make_one_shot_iterator()
    features = iterator.get_next()
    features, labels = features, features.pop(LABEL_COLUMN)
    if not modeTrainEval:
        return features, None
    return features, labels

不幸的是,这种变化导致我的损失始终为零,结果是预测非常糟糕(准确度为50%),我找不到原因。

(带有示例数据集和我的代码的github link

0 个答案:

没有答案