Question

对于tf.estimator.BoostedTreesClassifier，为什么所有要素列都必须是bucketsized或indicator列类型？

处理分类器使用的数字数据和分类数据的最佳方法是什么？

似乎无法使用数字数据。决策树是完美的，因为我什至不需要扩展数据。

我的代码如下：

def _parse_record():
    # do something
    return {'feature_1': array[0], 'feature_2': array[190.98]}, label

def input_fn():
    # parse record
    return dataset

feature_cols = []
for _ in numerical_features:
    feature_cols.append(tf.feature_column.numeric_column(key=_))
for _ in cat:
    c = tf.feature_column.categorical_column_with_hash_bucket(key=_, hash_bucket_size=100)
    ind = tf.feature_column.indicator_column(c)
    feature_cols.append(ind)

classifier = tf.estimator.BoostedTreesClassifier(
    feature_columns=feature_cols,
    n_batches_per_layer=100,
    n_trees=100,
)

f=lambda: input_fn()
classifier.train(input_fn=f)

但是，这给了我

ValueError：目前，只有bucketized_column和indicator列是支持但得到：_NumericColumn（key ='active_time'，shape =（1，）， default_value = None，dtype = tf.float32，normalizer_fn = None）

Answer 1

在tf.estimator.BoostedTreesClassifier中对数字功能的支持刚刚在TensorFlow v1.13（source，commit）中添加。第一个稳定版本是v1.13.1。

为什么TF Boosted Trees不接受数字数据作为输入？

1 个答案: