我正在使用来自tf.estimator的罐装DNNRegressor来预测每天根据不同天气特征访问公园的人数,而我目前无法理解使用TensorFlow Dataset API时遇到的一些问题。
我使用的数据集包括1200行和6列(降水,温度,工作日,季节(0 =第一年,1 =第二年......),周数和目标标签访客数)。使用pandas DataFrame存储数据。
在将任何数据输入模型之前,使用以下代码缩放数值:
train = data.sample(frac=0.8,random_state=19)
test = data.drop(train.index)
# Further split to X and y
train_features, train_labels = train, train.pop('count')
test_features, test_labels = test, test.pop('count')
# Standardize
from sklearn.preprocessing import StandardScaler
x_scaler = StandardScaler()
y_scaler = StandardScaler()
features_to_scale = ['precipitation', 'temperature']
train_features[features_to_scale] = x_scaler.fit_transform(train_features[features_to_scale])
test_features[features_to_scale] = x_scaler.transform(test_features[features_to_scale])
train_labels = y_scaler.fit_transform(np.array(train_labels).reshape(-1,1))
test_labels = y_scaler.transform(np.array(test_labels).reshape(-1,1))
接下来,定义要素列
weekday = tf.feature_column.categorical_column_with_identity('weekday', 8)
weeknum = tf.feature_column.categorical_column_with_identity('weeknum', 54)
season = tf.feature_column.categorical_column_with_identity('season', 4)
feature_columns = [
tf.feature_column.numeric_column('precipitation'),
tf.feature_column.numeric_column('temperature'),
tf.feature_column.indicator_column(weekday),
tf.feature_column.embedding_column(weeknum, 3),
tf.feature_column.indicator_column(season)
]
使用TensorFlow数据集API(此处称为方法1 )训练DNNRegressor时,训练损失不会稳定下降。这是我用来创建数据集并将其提供给我的模型的代码:
def input_fn_train(features, labels, batch_size, epochs):
# Convert the inputs to a Dataset
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
# Return a batch of (features, labels)
return (
dataset
.shuffle(512)
.repeat(epochs)
.batch(batch_size)
.make_one_shot_iterator().get_next()
)
STEPS = 20000
BATCH_SIZE = 1
EPOCHS = 1000
# Build Estimator
model = tf.estimator.DNNRegressor(
feature_columns=feature_columns,
hidden_units=[20,10]
)
# Train estimator
model.train(
input_fn=lambda:input_fn_train(
train_features,
train_labels,
BATCH_SIZE,
EPOCHS
),
steps=STEPS
)
以下是前1101步训练的日志输出。如您所见,训练损失并未稳步下降。它似乎根本不学习。
INFO:tensorflow:loss = 1.7277247, step = 1
INFO:tensorflow:global_step/sec: 487.874
INFO:tensorflow:loss = 0.1896706, step = 101 (0.206 sec)
INFO:tensorflow:global_step/sec: 419.013
INFO:tensorflow:loss = 0.035381828, step = 201 (0.243 sec)
INFO:tensorflow:global_step/sec: 478.715
INFO:tensorflow:loss = 0.0111698285, step = 301 (0.210 sec)
INFO:tensorflow:global_step/sec: 665.781
INFO:tensorflow:loss = 0.08243248, step = 401 (0.144 sec)
INFO:tensorflow:global_step/sec: 527.54
INFO:tensorflow:loss = 0.057627745, step = 501 (0.194 sec)
INFO:tensorflow:global_step/sec: 497.047
INFO:tensorflow:loss = 0.047706906, step = 601 (0.197 sec)
INFO:tensorflow:global_step/sec: 629.148
INFO:tensorflow:loss = 0.15168391, step = 701 (0.159 sec)
INFO:tensorflow:global_step/sec: 612.062
INFO:tensorflow:loss = 0.3931117, step = 801 (0.163 sec)
INFO:tensorflow:global_step/sec: 455.834
INFO:tensorflow:loss = 0.19988278, step = 901 (0.219 sec)
INFO:tensorflow:global_step/sec: 493.121
INFO:tensorflow:loss = 0.02624654, step = 1001 (0.212 sec)
INFO:tensorflow:global_step/sec: 454.812
INFO:tensorflow:loss = 0.187381, step = 1101 (0.212 sec)
...
INFO:tensorflow:Saving dict for global step 20000: average_loss = 0.121116325, global_step = 20000, loss = 0.121116325
但是,如果我重写input_fn
只返回功能字典和标签tf.constant
,则训练损失会逐渐减少,模型似乎也会学习。
def input_fn_train(features, labels):
x = {}
x['precipitation'] = tf.constant(features.precipitation.values)
x['temperature'] = tf.constant(features.temperature.values)
x['weekday'] = tf.constant(features.weekday.values)
x['weeknum'] = tf.constant(features.weeknum.values)
x['season'] = tf.constant(features.season.values)
y = tf.constant(labels)
return x, y
model.train(
input_fn=lambda:temp_input_fn(
train_features,
train_labels
),
steps=STEPS
)
TensorFlow日志:
INFO:tensorflow:loss = 0.97960025, step = 1
INFO:tensorflow:global_step/sec: 470.174
INFO:tensorflow:loss = 0.18198118, step = 101 (0.215 sec)
INFO:tensorflow:global_step/sec: 638.591
INFO:tensorflow:loss = 0.1380633, step = 201 (0.156 sec)
INFO:tensorflow:global_step/sec: 652.834
INFO:tensorflow:loss = 0.11286014, step = 301 (0.155 sec)
INFO:tensorflow:global_step/sec: 638.016
INFO:tensorflow:loss = 0.09432771, step = 401 (0.159 sec)
INFO:tensorflow:global_step/sec: 613.783
INFO:tensorflow:loss = 0.07982709, step = 501 (0.159 sec)
INFO:tensorflow:global_step/sec: 614.844
INFO:tensorflow:loss = 0.0700635, step = 601 (0.162 sec)
INFO:tensorflow:global_step/sec: 516.951
INFO:tensorflow:loss = 0.05970519, step = 701 (0.195 sec)
INFO:tensorflow:global_step/sec: 623.434
INFO:tensorflow:loss = 0.05116929, step = 801 (0.161 sec)
INFO:tensorflow:global_step/sec: 512.689
INFO:tensorflow:loss = 0.044941783, step = 901 (0.193 sec)
INFO:tensorflow:global_step/sec: 608.299
INFO:tensorflow:loss = 0.041477665, step = 1001 (0.166 sec)
INFO:tensorflow:global_step/sec: 390.28
INFO:tensorflow:loss = 0.036976893, step = 1101 (0.283 sec)
...
INFO:tensorflow:Saving dict for global step 20000: average_loss = 0.19528933, global_step = 20000, loss = 0.19528933
我错过了什么?为什么我的模型在使用TensorFlow数据集API(方法1)时似乎没有学习?