Question

我正在使用来自tf.estimator的罐装DNNRegressor来预测每天根据不同天气特征访问公园的人数，而我目前无法理解使用TensorFlow Dataset API时遇到的一些问题。

我使用的数据集包括1200行和6列（降水，温度，工作日，季节（0 =第一年，1 =第二年......），周数和目标标签访客数）。使用pandas DataFrame存储数据。

在将任何数据输入模型之前，使用以下代码缩放数值：

train = data.sample(frac=0.8,random_state=19)
test = data.drop(train.index)

# Further split to X and y
train_features, train_labels = train, train.pop('count')
test_features, test_labels   = test, test.pop('count')

# Standardize
from sklearn.preprocessing import StandardScaler
x_scaler = StandardScaler()
y_scaler = StandardScaler()

features_to_scale = ['precipitation', 'temperature']

train_features[features_to_scale] = x_scaler.fit_transform(train_features[features_to_scale])
test_features[features_to_scale]  = x_scaler.transform(test_features[features_to_scale])

train_labels = y_scaler.fit_transform(np.array(train_labels).reshape(-1,1))
test_labels  = y_scaler.transform(np.array(test_labels).reshape(-1,1))

接下来，定义要素列

weekday = tf.feature_column.categorical_column_with_identity('weekday', 8)
weeknum = tf.feature_column.categorical_column_with_identity('weeknum', 54)
season = tf.feature_column.categorical_column_with_identity('season', 4)

feature_columns = [
    tf.feature_column.numeric_column('precipitation'),
    tf.feature_column.numeric_column('temperature'),
    tf.feature_column.indicator_column(weekday),
    tf.feature_column.embedding_column(weeknum, 3),
    tf.feature_column.indicator_column(season)
]

使用TensorFlow数据集API（此处称为方法1 ）训练DNNRegressor时，训练损失不会稳定下降。这是我用来创建数据集并将其提供给我的模型的代码：

def input_fn_train(features, labels, batch_size, epochs):
    # Convert the inputs to a Dataset
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # Return a batch of (features, labels)
    return (
        dataset
        .shuffle(512)
        .repeat(epochs)
        .batch(batch_size)
        .make_one_shot_iterator().get_next()
    )

STEPS = 20000
BATCH_SIZE = 1
EPOCHS = 1000

# Build Estimator
model = tf.estimator.DNNRegressor(
    feature_columns=feature_columns,
    hidden_units=[20,10]
)

# Train estimator
model.train(
    input_fn=lambda:input_fn_train(
        train_features,
        train_labels,
        BATCH_SIZE,
        EPOCHS
    ),
    steps=STEPS
)

以下是前1101步训练的日志输出。如您所见，训练损失并未稳步下降。它似乎根本不学习。

INFO:tensorflow:loss = 1.7277247, step = 1
INFO:tensorflow:global_step/sec: 487.874
INFO:tensorflow:loss = 0.1896706, step = 101 (0.206 sec)
INFO:tensorflow:global_step/sec: 419.013
INFO:tensorflow:loss = 0.035381828, step = 201 (0.243 sec)
INFO:tensorflow:global_step/sec: 478.715
INFO:tensorflow:loss = 0.0111698285, step = 301 (0.210 sec)
INFO:tensorflow:global_step/sec: 665.781
INFO:tensorflow:loss = 0.08243248, step = 401 (0.144 sec)
INFO:tensorflow:global_step/sec: 527.54
INFO:tensorflow:loss = 0.057627745, step = 501 (0.194 sec)
INFO:tensorflow:global_step/sec: 497.047
INFO:tensorflow:loss = 0.047706906, step = 601 (0.197 sec)
INFO:tensorflow:global_step/sec: 629.148
INFO:tensorflow:loss = 0.15168391, step = 701 (0.159 sec)
INFO:tensorflow:global_step/sec: 612.062
INFO:tensorflow:loss = 0.3931117, step = 801 (0.163 sec)
INFO:tensorflow:global_step/sec: 455.834
INFO:tensorflow:loss = 0.19988278, step = 901 (0.219 sec)
INFO:tensorflow:global_step/sec: 493.121
INFO:tensorflow:loss = 0.02624654, step = 1001 (0.212 sec)
INFO:tensorflow:global_step/sec: 454.812
INFO:tensorflow:loss = 0.187381, step = 1101 (0.212 sec)
...
INFO:tensorflow:Saving dict for global step 20000: average_loss = 0.121116325, global_step = 20000, loss = 0.121116325

但是，如果我重写input_fn只返回功能字典和标签tf.constant，则训练损失会逐渐减少，模型似乎也会学习。

def input_fn_train(features, labels):
    x = {}
    x['precipitation'] = tf.constant(features.precipitation.values)
    x['temperature'] = tf.constant(features.temperature.values)
    x['weekday'] = tf.constant(features.weekday.values)
    x['weeknum'] = tf.constant(features.weeknum.values)
    x['season'] = tf.constant(features.season.values)

    y = tf.constant(labels)

    return x, y

model.train(
    input_fn=lambda:temp_input_fn(
        train_features,
        train_labels
    ),
    steps=STEPS
)

TensorFlow日志：

INFO:tensorflow:loss = 0.97960025, step = 1
INFO:tensorflow:global_step/sec: 470.174
INFO:tensorflow:loss = 0.18198118, step = 101 (0.215 sec)
INFO:tensorflow:global_step/sec: 638.591
INFO:tensorflow:loss = 0.1380633, step = 201 (0.156 sec)
INFO:tensorflow:global_step/sec: 652.834
INFO:tensorflow:loss = 0.11286014, step = 301 (0.155 sec)
INFO:tensorflow:global_step/sec: 638.016
INFO:tensorflow:loss = 0.09432771, step = 401 (0.159 sec)
INFO:tensorflow:global_step/sec: 613.783
INFO:tensorflow:loss = 0.07982709, step = 501 (0.159 sec)
INFO:tensorflow:global_step/sec: 614.844
INFO:tensorflow:loss = 0.0700635, step = 601 (0.162 sec)
INFO:tensorflow:global_step/sec: 516.951
INFO:tensorflow:loss = 0.05970519, step = 701 (0.195 sec)
INFO:tensorflow:global_step/sec: 623.434
INFO:tensorflow:loss = 0.05116929, step = 801 (0.161 sec)
INFO:tensorflow:global_step/sec: 512.689
INFO:tensorflow:loss = 0.044941783, step = 901 (0.193 sec)
INFO:tensorflow:global_step/sec: 608.299
INFO:tensorflow:loss = 0.041477665, step = 1001 (0.166 sec)
INFO:tensorflow:global_step/sec: 390.28
INFO:tensorflow:loss = 0.036976893, step = 1101 (0.283 sec)
...
INFO:tensorflow:Saving dict for global step 20000: average_loss = 0.19528933, global_step = 20000, loss = 0.19528933

我错过了什么？为什么我的模型在使用TensorFlow数据集API（方法1）时似乎没有学习？

使用TensorFlow数据集API

0 个答案: