使用numpy从csv输入数据时,Keras损失是微不足道的

时间:2018-08-01 02:10:49

标签: python-2.7 numpy tensorflow keras

我正在尝试使用TensorFlow的Boston housing price example学习如何使用TensorFlow / Keras进行回归,但是即使我所做的更改尽可能小,我仍然会遇到使用自己的数据的问题。在放弃自己编写所有内容之后,我仅更改了输入数据的代码的两行:

boston_housing = keras.datasets.boston_housing
(train_data, train_labels), (test_data, test_labels) = boston_housing.load_data()

在网上查看后,还应该从我的csv中创建numpy数组:

np_array = genfromtxt('trainingdata.csv', delimiter=',')
np_array = np.delete(np_array, (0), axis=0) # Remove header
test_np_array = np_array[:800,:]
tr_np_array = np_array[800:,:] # Separate out test and train data

train_labels = tr_np_array[:, 20] # Get the last row for the labels
test_labels = test_np_array[:, 20]

train_data = np.delete(tr_np_array, (20), axis=1)
test_data = np.delete(test_np_array, (20), axis=1) # Remove the last row so the data is only the features

我所看到的一切似乎都是正确的–数组的形状都是正确的,数组看起来确实是正确的numpy数组,功能似乎已经规范化,等等,但是当我将verbose设置为1时在model.fit(...)上,输出的第一行显示丢失问题:

Epoch 1/500

  32/2560 [..............................] - ETA: 18s - loss: nan - mean_absolute_error: nan
2016/2560 [======================>.......] - ETA: 0s - loss: nan - mean_absolute_error: nan 
2560/2560 [==============================] - 0s 133us/step - loss: nan - mean_absolute_error: nan - val_loss: nan - val_mean_absolute_error: nan

我特别困惑,因为在堆栈溢出的每个其他地方,我都看到“ TensorFlow损失为'NaN'”错误,通常a)带有自定义损失函数,b)一旦模型具有训练了一段时间,而不是在前52次通过内(如此处)。如果不是这种情况,那是因为未对数据进行规范化,但是我稍后在代码中进行了规范化,规范化适用于住房定价示例,并打印出聚集在0附近的数字。这时,我的最佳猜测是genfromtxt命令是有问题的,但是如果任何人都可以看到我在做错什么或在哪里可以找到我的问题,我将非常感激。

编辑:

这是程序的完整代码。注释掉第13至26行以及不注释第10和11行,使该程序正常运行。我尝试使用熊猫注释掉第13和14行,并取消注释16和17,但这导致了相同的错误。

import tensorflow as tf
from tensorflow import keras

import numpy as np
from numpy import genfromtxt
import pandas as pd

print(tf.__version__)

# boston_housing = keras.datasets.boston_housing # Line 10
# (train_data, train_labels), (test_data, test_labels) = boston_housing.load_data()

np_array = genfromtxt('trainingdata.csv', delimiter=',') # Line 13
np_array = np.delete(np_array, (0), axis=0)

# df = pd.read_csv('trainingdata.csv') # Line 16
# np_array = df.get_values()

test_np_array = np_array[:800,:]
tr_np_array = np_array[800:,:]

train_labels = tr_np_array[:, 20]
test_labels = test_np_array[:, 20]

train_data = np.delete(tr_np_array, (20), axis=1)
test_data = np.delete(test_np_array, (20), axis=1) # Line 26

order = np.argsort(np.random.random(train_labels.shape))
train_data = train_data[order]
train_labels = train_labels[order]

mean = train_data.mean(axis=0)
std = train_data.std(axis=0)
train_data = (train_data - mean) / std
test_data = (test_data - mean) / std

labels_mean = train_labels.mean(axis=0)
labels_std = train_labels.std(axis=0)
train_labels = (train_labels - labels_mean) / labels_std
test_labels = (test_labels - labels_mean) / labels_std

def build_model():
    model = keras.Sequential([
        keras.layers.Dense(64, activation=tf.nn.relu,
                           input_shape=(train_data.shape[1],)),
        keras.layers.Dense(64, activation=tf.nn.relu),
        keras.layers.Dense(1)
    ])

    optimizer = tf.train.RMSPropOptimizer(0.001)

    model.compile(loss='mse',
                  optimizer=optimizer,
                  metrics=['mae'])

    return model

model = build_model()
model.summary()

EPOCHS = 500

early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=20)

history = model.fit(train_data, train_labels, epochs=EPOCHS,
                    validation_split=0.2, verbose=1,
                    callbacks=[early_stop])

[loss, mae] = model.evaluate(test_data, test_labels, verbose=0)

print("Testing set Mean Abs Error: ${:7.2f}".format(mae * 1000 * labels_std))

0 个答案:

没有答案