我正在ML Engine-Google Cloud Platform上使用TensorFlow解决回归问题。我需要将包含日期的张量字符串发送给ML Engine,类似“ 2018/06/05 23:00”,然后从那里部署我的部署模型以提取基本上是(年,月,日,小时)的特征。对于上面的示例将是(2018,06,05,23)。问题是我需要在ML Engine中的部署模型中而不是在中间的API中执行此操作。
首先,我要做的是使普查模型教程适应我的回归问题。 https://cloud.google.com/ml-engine/docs/tensorflow/getting-started-training-prediction
在本教程中,他们通过终端使用gcloud命令gcloud ml-engine models create $MODEL_NAME ...
在ML Engine中部署模型。
在下面,您将找到我发现的处理包含日期以获取特征的字符串张量的方法
import tensorflow as tf
import numpy as np
date_time = tf.placeholder(shape=(1,), dtype=tf.string, name="ph_date_time")
INPUT_COLUMNS=["year", "month", "day", "hour"]
split_date_time = tf.string_split(date_time, ' ')
date = split_date_time.values[0]
time = split_date_time.values[1]
split_date = tf.string_split([date], '-')
split_time = tf.string_split([time], ':')
year = split_date.values[0]
month = split_date.values[1]
day = split_date.values[2]
hours = split_time.values[0]
minutes = split_time.values[1]
year = tf.string_to_number(year, out_type=tf.int32, name="year_temp")
month = tf.string_to_number(month, out_type=tf.int32, name="month_temp")
day = tf.string_to_number(day, out_type=tf.int32, name="day_temp")
hours = tf.string_to_number(hours, out_type=tf.int32, name="hour_temp")
minutes = tf.string_to_number(minutes, out_type=tf.int32, name="minute_temp")
year = tf.expand_dims(year, 0, name="year")
month = tf.expand_dims(month, 0, name="month")
day = tf.expand_dims(day, 0, name="day")
hours = tf.expand_dims(hours, 0, name="hours")
minutes = tf.expand_dims(minutes, 0, name="minutes")
features = []
features = np.append(features, year)
features = np.append(features, month)
features = np.append(features, day)
features = np.append(features, hours)
# this would be the actual features to the deployed model
actual_features = dict(zip(INPUT_COLUMNS, features))
with tf.Session() as sess:
year, month, day, hours, minutes = sess.run([year, month, day, hours, minutes], feed_dict={date_time: ["2018-12-31 22:59"]})
print("Year =", year)
print("Month =", month)
print("Day =", day)
print("Hours =", hours)
print("Minutes =", minutes)
问题是我不知道如何告诉ML引擎使用上述解析。我知道它与用于定义模型的input_fn
或用于导出模型的serving_input_fn
有关,但是我不确定是否必须将我的代码粘贴到这两者中或其中之一中,任何建议都将不胜感激,如果解释不清楚,我们将深表歉意。
答案 0 :(得分:0)
要遵循的一般模式是(请参见this doc):
input_fn
,通常使用tf.data.Dataset
。 input_fn
应该调用辅助函数来进行数据转换,就像代码中的那样。输出将是功能名称到值批次的字典。input_fn
输出中的项目定义FeatureColumns。如有必要,请执行特征交叉,存储桶化等操作。DnnRegressor
),将FeatureColumns传递给构造函数input_fn
,它具有一个或多个tf.Placeholder
(可变批量大小)作为外部尺寸的None
。从(1)调用相同的帮助器函数进行转换。返回一个tf.estimator.export.ServingInputReceiver
,并以占位符作为输入,并返回一个与(1)中的字典看起来相同的字典。您的特殊情况需要一些其他细节。首先,您已将批次大小1硬编码到占位符中,并且相应的代码继续了该假设。您的占位符必须具有shape=[None]
。
不幸的是,您的代码是在形状为1的假设下编写的,例如split_date_time.values[0]
将不再有效。我在下面的代码中添加了一个辅助函数来解决该问题。
以下代码有望为您服务:
import tensorflow as tf
# tf.string_split returns a SparseTensor. When using a variable batch size,
# this can be difficult to further manipulate. In our case, we don't need
# a SparseTensor, because we have a fixed number of elements each split.
# So we do the split and convert the SparseTensor to a dense tensor.
def fixed_split(batched_string_tensor, delimiter, num_cols):
# When splitting a batch of elements, the values array is row-major, e.g.
# ["2018-01-02", "2019-03-04"] becomes ["2018", "01", "02", "2019", "03", "04"].
# So we simply split the string then reshape the array to create a dense
# matrix with the same rows as the input, but split into columns, e.g.,
# [["2018", "01", "02"], ["2019", "03", "04"]]
split = tf.string_split(batched_string_tensor, delimiter)
return tf.reshape(split.values, [-1, num_cols])
def parse_dates(dates):
split_date_time = fixed_split(dates, ' ', 2)
date = split_date_time[:, 0]
time = split_date_time[:, 1]
# The values of the resulting SparseTensor will alternate between year, month, and day
split_date = fixed_split(date, '-', 3)
split_time = fixed_split(time, ':', 2)
year = split_date[:, 0]
month = split_date[:, 1]
day = split_date[:, 2]
hours = split_time[:, 0]
minutes = split_time[:, 1]
year = tf.string_to_number(year, out_type=tf.int32, name="year_temp")
month = tf.string_to_number(month, out_type=tf.int32, name="month_temp")
day = tf.string_to_number(day, out_type=tf.int32, name="day_temp")
hours = tf.string_to_number(hours, out_type=tf.int32, name="hour_temp")
minutes = tf.string_to_number(minutes, out_type=tf.int32, name="minute_temp")
return {"year": year, "month": month, "day": day, "hours": hours, "minutes": minutes}
def training_input_fn():
filenames = ["/var/data/file1.txt", "/var/data/file2.txt"]
dataset = tf.data.TextLineDataset(filenames)
dataset.batch(BATCH_SIZE)
return parse_dates(iterator.get_next())
def serving_input_fn():
date_strings = tf.placeholder(dtype=tf.string, shape=[None], name="date_strings")
features = parse_dates(date_strings)
return tf.estimator.export.ServingInputReceiver(features, date_strings)
with tf.Session() as sess:
date_time_list = ["2018-12-31 22:59", "2018-01-23 2:09"]
date_strings = tf.placeholder(dtype=tf.string, shape=[None], name="date_strings")
features = parse_dates(date_strings)
fetches = [features[k] for k in ["year", "month", "day", "hours", "minutes"]]
year, month, day, hours, minutes = sess.run(fetches, feed_dict={date_strings: date_time_list})
print("Year =", year)
print("Month =", month)
print("Day =", day)
print("Hours =", hours)
print("Minutes =", minutes)