我正在尝试在包含分类和数字数据混合的一些日志数据上运行带有TensorFlow的DNNClassifier。我创建了功能列来指定和散列/散列数据以进行张量流。当我运行代码时,我收到'无法获取元素为字节'内部错误。注意:我不想按照此article中的说明删除Nan值,因此我使用此代码UITableView
将它们转换为0,因此我不确定为什么我仍然会遇到此错误。如果我dropna然后它工作但我不想放弃Nan的,因为我觉得他们需要模型训练。
train = train.fillna(0, axis=0)
然后我收到此错误:
def create_train_input_fn():
return tf.estimator.inputs.pandas_input_fn(
x=train,
y=train_label,
batch_size=32,
num_epochs=None,
shuffle=True)
def create_test_input_fn():
return tf.estimator.inputs.pandas_input_fn(
x=valid,
y=valid_label,
num_epochs=1,
shuffle=False)
feature_columns = []
end_time = tf.feature_column.embedding_column(tf.feature_column.categorical_column_with_hash_bucket('end_time', 1000), 10)
feature_columns.append(end_time)
device = tf.feature_column.embedding_column(tf.feature_column.categorical_column_with_hash_bucket('device', 1000), 10)
feature_columns.append(device)
device_os = tf.feature_column.embedding_column(tf.feature_column.categorical_column_with_hash_bucket('device_os', 1000), 10)
feature_columns.append(device_os)
device_os_version = tf.feature_column.embedding_column(tf.feature_column.categorical_column_with_hash_bucket('device_os_version', 1000), 10)
feature_columns.append(device_os_version)
Latency = tf.feature_column.bucketized_column(
tf.feature_column.numeric_column('Latency'),
boundaries=[.000000, .000010, .000100, .001000, .010000, .100000])
feature_columns.append(Latency)
Megacycles = tf.feature_column.bucketized_column(
tf.feature_column.numeric_column('Megacycles'),
boundaries=[0, 50, 100, 200, 300])
feature_columns.append(Megacycles)
Cost = tf.feature_column.bucketized_column(
tf.feature_column.numeric_column('Cost'),
boundaries=[0.000001e-08, 1.000000e-08, 5.000000e-08, 10.000000e-08, 15.000000e-08 ])
feature_columns.append(Cost)
device_brand = tf.feature_column.embedding_column(tf.feature_column.categorical_column_with_hash_bucket('device_brand', 1000), 10)
feature_columns.append(device_brand)
device_family = tf.feature_column.embedding_column(tf.feature_column.categorical_column_with_hash_bucket('device_family', 1000), 10)
feature_columns.append(device_family)
browser_version = tf.feature_column.embedding_column(tf.feature_column.categorical_column_with_hash_bucket('browser_version', 1000), 10)
feature_columns.append(browser_version)
app = tf.feature_column.embedding_column(tf.feature_column.categorical_column_with_hash_bucket('app', 1000), 10)
feature_columns.append(app)
ua_parse = tf.feature_column.embedding_column(tf.feature_column.categorical_column_with_hash_bucket('ua_parse', 1000), 10)
feature_columns.append(ua_parse)
estimator = tf.estimator.DNNClassifier(hidden_units=[256, 128, 64],
feature_columns=feature_columns,
n_classes=2,
model_dir='graphs/dnn')
train_input_fn = create_train_input_fn()
estimator.train(train_input_fn, steps=2000)
答案 0 :(得分:0)
我同意Thomas Decaux的观点。我遇到了完全相同的问题。我检查了标签是否用字符串(“是”和“否”)而不是整数(1,0)表示。将标签转换为int64后,没有出现此类错误。