我有一个回归任务数据。
独立功能(X_train
)用标准缩放器缩放。
建立了Keras顺序模型,添加了隐藏层。编译模型。
然后使用model.fit(X_train_scaled, y_train )
拟合模型
然后,将模型保存到.hdf5
文件中。
现在如何将缩放部分包括在已保存的模型中, 以便将相同的缩放参数应用于看不见的测试数据。
#imported all the libraries for training and evaluating the model
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=42)
sc = StandardScaler()
X_train_scaled = sc.fit_transform(X_train)
X_test_scaled= sc.transform (X_test)
def build_model():
model = keras.Sequential([layers.Dense(64, activation=tf.nn.relu,input_shape=[len(train_dataset.keys())]),
layers.Dense(64, activation=tf.nn.relu),
layers.Dense(1)
])
optimizer = tf.keras.optimizers.RMSprop(0.001)
model.compile(loss='mean_squared_error',
optimizer=optimizer,
metrics=['mean_absolute_error', 'mean_squared_error'])
return model
model = build_model()
EPOCHS=1000
history = model.fit(X_train_scaled, y_train, epochs=EPOCHS,
validation_split = 0.2, verbose=0)
loss, mae, mse = model.evaluate(X_test_scaled, y_test, verbose=0)
答案 0 :(得分:1)
据我所知,标准和有效的方法是使用Tensorflow Transform。这并不是说如果必须使用TF Transform,就应该使用整个TFX Pipeline。 TF转换也可以用作独立版本。
Tensorflow变换创建一个波束变换图,该图将这些变换作为常量注入Tensorflow图中。由于这些转换在图表中表示为常量,因此它们在训练和服务过程中将保持一致。在培训和服务过程中保持一致性的优势是
TF转换的示例代码如下:
用于导入所有依赖项的代码:
try:
import tensorflow_transform as tft
import apache_beam as beam
except ImportError:
print('Installing TensorFlow Transform. This will take a minute, ignore the warnings')
!pip install -q tensorflow_transform
print('Installing Apache Beam. This will take a minute, ignore the warnings')
!pip install -q apache_beam
import tensorflow_transform as tft
import apache_beam as beam
import tensorflow as tf
import tensorflow_transform.beam as tft_beam
from tensorflow_transform.tf_metadata import dataset_metadata
from tensorflow_transform.tf_metadata import dataset_schema
下面提到的是预处理功能,其中我们提到了所有转换:
def preprocessing_fn(inputs):
"""Preprocess input columns into transformed columns."""
# Since we are modifying some features and leaving others unchanged, we
# start by setting `outputs` to a copy of `inputs.
outputs = inputs.copy()
# Scale numeric columns to have range [0, 1].
for key in NUMERIC_FEATURE_KEYS:
outputs[key] = tft.scale_to_0_1(outputs[key])
for key in OPTIONAL_NUMERIC_FEATURE_KEYS:
# This is a SparseTensor because it is optional. Here we fill in a default
# value when it is missing.
dense = tf.sparse_to_dense(outputs[key].indices,
[outputs[key].dense_shape[0], 1],
outputs[key].values, default_value=0.)
# Reshaping from a batch of vectors of size 1 to a batch to scalars.
dense = tf.squeeze(dense, axis=1)
outputs[key] = tft.scale_to_0_1(dense)
return outputs
除了
tft.scale_to_0_1
您还可以使用其他API进行标准化,例如
tft.scale_by_min_max, tft.scale_to_z_score
您可以参考下面提到的链接以获取详细信息和TF转换教程。