如何获得Estimator之外的原始特征的feature_column值?

时间:2018-03-08 11:34:04

标签: python tensorflow

feature_column中使用Tensorflow API的所有示例中,它们展示了如何在input_fn中创建原始要素,然后创建定义所需的feature_column数组映射,然后传递给Estimator。在运行时,Estimator然后将两个组合在一起,并进行实际的特征编码。如何在Estimator API之外执行此操作?我已经查看了Tensorflow的源代码并空手而归。

以下是一些可用于演示我需要的源代码。我想使用age-bucketseducation创建一个年龄组合的功能,结果为[2,0]

import tensorflow as tf

feature_names = [
    'age','education']

label_names = [
'>50K',
'<=50K']

d = dict(zip(feature_names, [34, 'Bachelors'])), '>50K'

print(d)

with tf.Session() as sess:

    age = tf.feature_column.numeric_column('age')
    age_buckets = tf.feature_column.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])    
    education = tf.feature_column.categorical_column_with_vocabulary_list(
    'education', [
        'Bachelors', 'HS-grad', '11th', 'Masters', '9th', 'Some-college',
        'Assoc-acdm', 'Assoc-voc', '7th-8th', 'Doctorate', 'Prof-school',
        '5th-6th', '10th', '1st-4th', 'Preschool', '12th'])
    base_columns = [age_buckets, education]
    print(base_columns)

1 个答案:

答案 0 :(得分:0)

事实证明,除了tf.feature_column.input_layer之外,您还需要使用tf.train.MonitoredTrainingSession()来初始化所需的表格。

import tensorflow as tf

feature_names = [
    'age','education']

d = dict(zip(feature_names, [[34], ['Bachelors']])), '>50K'

print(d[0])
age = tf.feature_column.numeric_column('age')
age_buckets = tf.feature_column.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65]) 
education_vocabulary_list = [
    'Bachelors', 'HS-grad', '11th', 'Masters', '9th', 'Some-college',
    'Assoc-acdm', 'Assoc-voc', '7th-8th', 'Doctorate', 'Prof-school',
    '5th-6th', '10th', '1st-4th', 'Preschool', '12th'] 
education = tf.feature_column.categorical_column_with_vocabulary_list('education', vocabulary_list=education_vocabulary_list)
eductation_indicator = tf.feature_column.indicator_column(education)
feature_columns = [age_buckets, eductation_indicator]
print(feature_columns)

input_layer = tf.feature_column.input_layer(
    features=d[0],
    feature_columns=feature_columns
)

zero = tf.constant(0, dtype=tf.float32)
where = tf.not_equal(input_layer, zero)
indices = tf.where(where)
values = tf.gather_nd(input_layer, indices)
sparse = tf.SparseTensor(indices, values, input_layer.shape)


with tf.train.MonitoredTrainingSession() as sess:

    print(input_layer)
    print(sess.run(input_layer))
    print(sess.run(sparse))