使用tensorflow的indicator_column功能对空格分隔的分类变量进行多次编码

时间:2019-02-05 11:33:40

标签: python tensorflow machine-learning artificial-intelligence data-science

import tensorflow as tf
feature_names = ['education']

d = dict(zip(feature_names, [["Bachelors","11th"]]))
print(d)
education_vocabulary_list = [
    'Bachelors', 'HS-grad', '11th', 'Masters', '9th', 'Some-college',
    'Assoc-acdm', 'Assoc-voc', '7th-8th', 'Doctorate', 'Prof-school',
    '5th-6th', '10th', '1st-4th', 'Preschool', '12th'] 
education = tf.feature_column.categorical_column_with_vocabulary_list('education', vocabulary_list=education_vocabulary_list)
eductation_indicator = tf.feature_column.indicator_column(education)
feature_columns = [eductation_indicator]
print(feature_columns)

input_layer = tf.feature_column.input_layer(
    features=d,
    feature_columns=feature_columns
)


with tf.train.MonitoredTrainingSession() as sess:
    print(input_layer)
    print(sess.run(input_layer))

在上面的示例中,我得到以下输出

[<tf.Tensor 'input_layer/concat:0' shape=(2, 16) dtype=float32>][array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
   [0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]],
  dtype=float32)]

我期望输出是链接(https://www.tensorflow.org/api_docs/python/tf/feature_column/indicator_column中提到的密集张量

基本上,我想对空格分隔的分类变量进行多次编码。然后,我将该列以及其他功能提供给DNNClassifer进行模型训练。

如何使用tensorflow的indicator_column功能对空格分隔的分类变量实现多热编码?

0 个答案:

没有答案