使用tensorflow Dataset API将数据逐步加载到估算器中会返回以下错误:
ValueError: column_name: featureCategorical input_tensor dtype must be string or integer. dtype: <dtype: 'float32'>.
我正在使用的数据输入函数逐步加载数据并输出批量数据,这些数据将被摄入到估计器中。
def read_dataset(filename):
def _input_fn():
def decode_line(row):
columns = tf.decode_csv(row, record_defaults = DEFAULTS)
features = dict(zip(["featureCategorical","featurNumeric1","featurNumeric2"], columns))
label = features.pop('label')
return features, label
# Create list of file names that match "glob" pattern (i.e. data_file_*.csv)
filenames_dataset = tf.data.Dataset.list_files(filename)
# Read lines from text files
textlines_dataset = filenames_dataset.flat_map(tf.data.TextLineDataset)
# Parse text lines as comma-separated values (CSV)
dataset = textlines_dataset.map(decode_line)
#--->this dataset contains only floats but feature "featureCategorical" needs to be a string
num_epochs = None
dataset = dataset.shuffle(buffer_size = 10 * 500)
dataset = dataset.repeat(num_epochs).batch(batch_size)
return dataset.make_one_shot_iterator().get_next()
return _input_fn
所有功能都为float类型,但由于某些功能是分类功能,因此它们应为string类型。
如何在数据输入函数中仅将分类特征转换为字符串?非常感谢!