(python,机器学习和TensorFlow的新手)
我正在尝试将TensorFlow Linear Model Tutorial从他们的官方文档调整到ICU机器学习库中的Abalone dataset。目的是从其他给定数据中猜测鲍鱼的年龄(年龄)。
运行以下程序时,我得到以下内容:
File "/home/lawrence/tensorflow3.5/lib/python3.5/site-packages/tensorflow /python/ops/lookup_ops.py", line 220, in lookup
(self._key_dtype, keys.dtype))
TypeError: Signature mismatch. Keys must be dtype <dtype: 'string'>, got <dtype: 'int32'>.
错误在第220行的lookup_ops.py中抛出,并记录为在以下情况下抛出:
Raises:
TypeError: when `keys` or `default_value` doesn't match the table data types.
从调试parse_csv()
开始,似乎所有张量都是使用正确的类型创建的。
你能解释一下出了什么问题吗?我相信我遵循教程代码逻辑,无法解决这个问题。
源代码:
import tensorflow as tf
import shutil
_CSV_COLUMNS = [
'sex', 'length', 'diameter', 'height', 'whole_weight',
'shucked_weight', 'viscera_weight', 'shell_weight', 'rings'
]
_CSV_COLUMN_DEFAULTS = [['M'], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0]]
_NUM_EXAMPLES = {
'train': 3000,
'validation': 1177,
}
def build_model_columns():
"""Builds a set of wide feature columns."""
# Continuous columns
sex = tf.feature_column.categorical_column_with_hash_bucket('sex', hash_bucket_size=1000)
length = tf.feature_column.numeric_column('length', dtype=tf.float32)
diameter = tf.feature_column.numeric_column('diameter', dtype=tf.float32)
height = tf.feature_column.numeric_column('height', dtype=tf.float32)
whole_weight = tf.feature_column.numeric_column('whole_weight', dtype=tf.float32)
shucked_weight = tf.feature_column.numeric_column('shucked_weight', dtype=tf.float32)
viscera_weight = tf.feature_column.numeric_column('viscera_weight', dtype=tf.float32)
shell_weight = tf.feature_column.numeric_column('shell_weight', dtype=tf.float32)
base_columns = [sex, length, diameter, height, whole_weight,
shucked_weight, viscera_weight, shell_weight]
return base_columns
def build_estimator():
"""Build an estimator appropriate for the given model type."""
base_columns = build_model_columns()
return tf.estimator.LinearClassifier(
model_dir="~/models/albones/",
feature_columns=base_columns,
label_vocabulary=_CSV_COLUMNS)
def input_fn(data_file, num_epochs, shuffle, batch_size):
"""Generate an input function for the Estimator."""
assert tf.gfile.Exists(data_file), (
'%s not found. Please make sure you have either run data_download.py or '
'set both arguments --train_data and --test_data.' % data_file)
def parse_csv(value):
print('Parsing', data_file)
columns = tf.decode_csv(value, record_defaults=_CSV_COLUMN_DEFAULTS)
features = dict(zip(_CSV_COLUMNS, columns))
labels = features.pop('rings')
return features, labels
# Extract lines from input files using the Dataset API.
dataset = tf.data.TextLineDataset(data_file)
if shuffle:
dataset = dataset.shuffle(buffer_size=_NUM_EXAMPLES['train'])
dataset = dataset.map(parse_csv)
# We call repeat after shuffling, rather than before, to prevent separate
# epochs from blending together.
dataset = dataset.repeat(num_epochs)
dataset = dataset.batch(batch_size)
iterator = dataset.make_one_shot_iterator()
features, labels = iterator.get_next()
return features, labels
def main(unused_argv):
# Clean up the model directory if present
shutil.rmtree("/home/lawrence/models/albones/", ignore_errors=True)
model = build_estimator()
# Train and evaluate the model every `FLAGS.epochs_per_eval` epochs.
for n in range(40 // 2):
model.train(input_fn=lambda: input_fn(
"/home/lawrence/abalone.data", 2, True, 40))
results = model.evaluate(input_fn=lambda: input_fn(
"/home/lawrence/abalone.data", 1, False, 40))
# Display evaluation metrics
print('Results at epoch', (n + 1) * 2)
print('-' * 60)
for key in sorted(results):
print('%s: %s' % (key, results[key]))
if __name__ == '__main__':
tf.logging.set_verbosity(tf.logging.INFO)
tf.app.run(main=main)
以下是abalone.names的数据集列的分类:
Name Data Type Meas. Description
---- --------- ----- -----------
Sex nominal M, F, [or] I (infant)
Length continuous mm Longest shell measurement
Diameter continuous mm perpendicular to length
Height continuous mm with meat in shell
Whole weight continuous grams whole abalone
Shucked weight continuous grams weight of meat
Viscera weight continuous grams gut weight (after bleeding)
Shell weight continuous grams after being dried
Rings integer +1.5 gives the age in years
数据集条目按此顺序显示为常用的分隔值,并为新条目添加新行。
答案 0 :(得分:1)
你几乎完成了所有事情。问题在于估算器的定义。
任务是预测Rings
列,这是一个整数,因此它看起来像回归问题。但是您已经决定执行分类任务,这也是有效的:
def build_estimator():
"""Build an estimator appropriate for the given model type."""
base_columns = build_model_columns()
return tf.estimator.LinearClassifier(
model_dir="~/models/albones/",
feature_columns=base_columns,
label_vocabulary=_CSV_COLUMNS)
默认情况下,tf.estimator.LinearClassifier
采用二进制分类,即n_classes=2
。在你的情况下,它显然不是真的 - 这是第一个错误。您还设置了label_vocabulary
,张量流将其解释为标签列中的一组可能值。这就是它期望tf.string
dtype的原因。由于Rings
是一个整数,因此您根本不需要label_vocabulary
。
将它们组合在一起:
def build_estimator():
"""Build an estimator appropriate for the given model type."""
base_columns = build_model_columns()
return tf.estimator.LinearClassifier(
model_dir="~/models/albones/",
feature_columns=base_columns,
n_classes=30)
我建议您也尝试tf.estimator.LinearRegressor
,这可能更准确。