tensorflow供稿列表功能(多热点)到tf.estimator

时间:2018-10-22 22:45:57

标签: python tensorflow machine-learning neural-network deep-learning

某些功能列的数据类型为list。它们的长度可以不同。我想将此列编码为多热点分类特征并将其提供给tf.estimator。我尝试了以下操作,但显示了错误Unable to get element as bytes。我认为这是深度学习特别是推荐系统中的常见做法。深度和广度模型。我发现了一个相关的问题here,但没有显示如何向估算器提供数据。

import pandas as pd
import tensorflow as tf

OUTDIR = "./data"

data = {"x": [["a", "c"], ["a", "b"], ["b", "c"]], "y": ["x", "y", "z"]}
df = pd.DataFrame(data)

Y = df["y"]
X = df.drop("y", axis=1)

indicator_features = [
    tf.feature_column.indicator_column(
        categorical_column=tf.feature_column.categorical_column_with_vocabulary_list(
            key="x", vocabulary_list=["a", "b", "c"]
        )
    )
]

model = tf.estimator.LinearClassifier(
    feature_columns=indicator_features, model_dir=OUTDIR
)

training_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=X, y=Y, batch_size=64, shuffle=True, num_epochs=None
)

model.train(input_fn=training_input_fn)

以下错误:

  

INFO:tensorflow:使用默认配置。 INFO:tensorflow:使用配置:   {'_model_dir':'testalg','_ tf_random_seed':无,   '_save_summary_steps':100,'_ save_checkpoints_steps':无,   '_save_checkpoints_secs':600,'_session_config':无,   '_keep_checkpoint_max':5,'_keep_checkpoint_every_n_hours':10000,   '_log_step_count_steps':100,'_ train_distribute':无,'_ device_fn':   无,“ _ service”:无,“ _ cluster_spec”:   ,'_task_type':'工人','_task_id':0,   '_global_id_in_cluster':0,'_master':'','_evaluation_master':'',   '_is_chief':是,'_ num_ps_replicas':0,'_ num_worker_replicas':1}   INFO:tensorflow:调用model_fn。 INFO:tensorflow:完成调用   model_fn。 INFO:tensorflow:创建CheckpointSaverHook。   INFO:tensorflow:Graph已完成。 INFO:tensorflow:正在运行   local_init_op。 INFO:tensorflow:已运行local_init_op。   INFO:tensorflow:错误报告给协调员:,无法   以字节为单位获取元素。 INFO:tensorflow:将0的检查点保存到   testalg / model.ckpt。   -------------------------------------------------- ----- InternalError跟踪(最近一次通话)   /home/yinan.li1/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py   在_do_call(self,fn,* args)1321中尝试:   -> 1322返回fn(* args)1323除了error.OpError为e:

     

/home/yinan.li1/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py   在_run_fn中(feed_dict,fetch_list,target_list,选项,run_metadata)   1306 =返回   -> 1307选项,feed_dict,fetch_list,target_list,run_metadata)1308

     

/home/yinan.li1/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py   在_call_tf_sessionrun(自身,选项,feed_dict,fetch_list,   target_list,run_metadata)1408 self._session,选项,   feed_dict,fetch_list,target_list,   -> 1409 run_metadata)1410其他:

     

InternalError:无法将元素获取为字节。

     

在处理上述异常期间,发生了另一个异常:

     

InternalError跟踪(最近一次通话)    在()中        44        45   ---> 46型.train(input_fn = training_input_fn)

     

/home/yinan.li1/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py   在训练中(自我,input_fn,钩子,步骤,max_steps,saving_listeners)       364       365saving_listeners = _check_listeners_type(saving_listeners)   -> 366损失= self._train_model(input_fn,hooks,saving_listeners)       367 logging.info('最后一步的损失:%s。',损失)       368返回自己

     

/home/yinan.li1/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py   在_train_model(自己,input_fn,钩子,saving_listeners)中1117
  返回self._train_model_distributed(input_fn,钩子,   1118其他:   -> 1119返回self._train_model_default(self,_fn,钩子,save_listeners)1120 1121 def _train_model_default(self,   input_fn,hooks,saving_listeners):

     

/home/yinan.li1/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py   在_train_model_default(自己,input_fn,钩子,saving_listeners)中
  1133返回self._train_with_estimator_spec(estimator_spec,   worker_hooks,1134
  钩子,global_step_tensor,   -> 1135 Saving_listeners)1136 1137 def _train_model_distributed(self,input_fn,hooks,   Saving_listeners):

     

/home/yinan.li1/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py   在_train_with_estimator_spec(自己,estimator_spec,worker_hooks,   钩,global_step_tensor,saving_listeners)1334损失=无   第1335章真相大白   -> 1336 _,损失= mon_sess.run([estimator_spec.train_op,estimator_spec.loss])1337回波损耗1338

     

/home/yinan.li1/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py   在退出中(自身,exception_type,exception_value,追溯)       687如果[errors.OutOfRangeError,StopIteration]中的exception_type:       688 exception_type =无   -> 689 self._close_internal(exception_type)       690# exit 应该返回True以抑制异常。       691返回exception_type为None

     

/home/yinan.li1/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py   在_close_internal(self,exception_type)中       724如果self._sess为None:       725引发RuntimeError('会话已关闭。')   -> 726个self._sess.close()       终于727:       728 self._sess = None

     

/home/yinan.li1/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py   近距离(个体经营)       第972章       973尝试:   -> 974 self._sess.close()       975,除了_PREEMPTION_ERRORS:       976次通过

     

/home/yinan.li1/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py   in close(self)1116 self._coord.join(1117
  stop_grace_period_secs = self._stop_grace_period_secs,   -> 1118 ignore_live_threads = True)1119终于:1120试试:

     

/home/yinan.li1/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py   在join中(自我,线程,stop_grace_period_secs,ignore_live_threads)       第387章       第388章   -> 389 six.reraise(* self._exc_info_to_raise)       390位Elif散客       第391章没关系      

/home/yinan.li1/anaconda3/lib/python3.6/site-packages/six.py在   加价(tp,值,tb)       683 = tp()       684如果值。回溯不是tb:   -> 685提高值.with_traceback(tb)       686提升价值       687

     

/home/yinan.li1/anaconda3/lib/python3.6/site-packages/tensorflow/python/estimator/inputs/queues/feeding_queue_runner.py   在_run中(self,sess,enqueue_op,feed_fn,coord)        92试试:        93 feed_dict =如果feed_fn为None则为none feed_fn()   ---> 94 sess.run(enqueue_op,feed_dict = feed_dict)        95除外(errors.OutOfRangeError,errors.CancelledError):        96#此异常表明队列已关闭。

     

/home/yinan.li1/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py   在运行中(自我,获取,feed_dict,选项,run_metadata)       898尝试:       第899章真相大白   -> 900 run_metadata_ptr)       901如果run_metadata:       902 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

     

/home/yinan.li1/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py   在_run(自身,句柄,访存,feed_dict,选项,run_metadata)中
  1133年如果final_fetches或final_targets或(句柄和   feed_dict_tensor):1134个结果= self._do_run(handle,   final_targets,final_fetches,   -> 1135 feed_dict_tensor,选项,run_metadata)1136否则:1137结果= []

     

/home/yinan.li1/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py   在_do_run(自身,句柄,target_list,fetch_list,feed_dict,选项,   run_metadata)1314如果句柄为None:1315返回   self._do_call(_run_fn,提要,提取,目标,选项,   -> 1316 run_metadata)1317否则:1318返回self._do_call(_prun_fn,句柄,提要,获取)

     

/home/yinan.li1/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py   在_do_call(self,fn,* args)中1333,但KeyError:1334   通过   -> 1335提高类型(e)(node_def,op,消息)1336 1337 def _extend_graph(自身):

     

InternalError:无法将元素获取为字节。

1 个答案:

答案 0 :(得分:0)

我认为您遇到的问题之一是,pandas中的列类型实际上是对象而不是字符串。如果将其转换为单独的字符串列,您将摆脱此错误。请记住,The basic TensorFlow tf.string dtype allows you to build tensors of byte strings.以及在此列中存储对象(而不是字符串)时都会出现错误。

下面的代码可以克服上面的错误,但不能完全解决您的问题。列表的可变长度必须通过填充或列表来处理,或者类似的问题indicator_column可能在处理缺失值时遇到问题。

X2= pd.DataFrame(X['x'].values.tolist(), columns=['x1','x2'])

feat1 = tf.feature_column.categorical_column_with_vocabulary_list(
            key="x1", vocabulary_list=["a", "b", "c"]
        )
feat2 = tf.feature_column.categorical_column_with_vocabulary_list(
            key="x2", vocabulary_list=["a", "b", "c"]
        )
indicator_features = [
    tf.feature_column.indicator_column(
        categorical_column=feat1
    ),tf.feature_column.indicator_column(
        categorical_column=feat2
    )
]

training_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=X2, y=Y, batch_size=64, shuffle=True, num_epochs=None
)