tf.decode_csv()错误:"未加引号的字段内部不能有引号/ CRLF"

时间:2016-02-20 22:02:51

标签: python tensorflow

我有一个csv文件blah.txt,如下所示:

1,2
3,4

我可以按如下方式阅读csv:

import tensorflow as tf
sess = tf.InteractiveSession()
csv_train= tf.read_file('blah.txt')
csv_train.eval()

哪个输出:

Out[5]: '1,2\n3,4'

我试图按如下方式解码csv:

col1,col2 = tf.decode_csv(csv_train,
                          record_defaults=[tf.constant([],dtype=tf.int32),
                                           tf.constant([],dtype=tf.int32)])

现在,当我运行col1.eval()时,我收到错误:

W tensorflow/core/common_runtime/executor.cc:1102] 0x7ff203f17240 Compute status: Invalid argument: Unquoted fields cannot have quotes/CRLFs inside
     [[Node: DecodeCSV_6 = DecodeCSV[OUT_TYPE=[DT_INT32, DT_INT32], field_delim=",", _device="/job:localhost/replica:0/task:0/cpu:0"](ReadFile, Const_12, Const_13)]]
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 3066, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-17-dc904e64a78b>", line 1, in <module>
    col1.eval()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 465, in eval
    return _eval_using_default_session(self, feed_dict, self.graph, session)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3097, in _eval_using_default_session
    return session.run(tensors, feed_dict)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 315, in run
    return self._run(None, fetches, feed_dict)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 511, in _run
    feed_dict_string)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 564, in _do_run
    target_list)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 586, in _do_call
    e.code)
InvalidArgumentError: Unquoted fields cannot have quotes/CRLFs inside
     [[Node: DecodeCSV_6 = DecodeCSV[OUT_TYPE=[DT_INT32, DT_INT32], field_delim=",", _device="/job:localhost/replica:0/task:0/cpu:0"](ReadFile, Const_12, Const_13)]]
Caused by op u'DecodeCSV_6', defined at:
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevconsole.py", line 488, in <module>
    pydevconsole.StartServer(pydev_localhost.get_localhost(), int(port), int(client_port))
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevconsole.py", line 334, in StartServer
    process_exec_queue(interpreter)
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevconsole.py", line 209, in process_exec_queue
    more = interpreter.addExec(code_fragment)
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydev_console_utils.py", line 201, in addExec
    more = self.doAddExec(code_fragment)
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydev_ipython_console.py", line 42, in doAddExec
    res = bool(self.interpreter.addExec(codeFragment.text))
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydev_ipython_console_011.py", line 435, in addExec
    self.ipython.run_cell(line, store_history=True)
  File "/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2902, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 3006, in run_ast_nodes
    if self.run_code(code, result):
  File "/usr/local/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 3066, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-16-0556330e1530>", line 3, in <module>
    tf.constant([],dtype=tf.int32)])
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_parsing_ops.py", line 38, in decode_csv
    field_delim=field_delim, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2040, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1087, in __init__
    self._traceback = _extract_stack()

如何解码此csv?

1 个答案:

答案 0 :(得分:3)

tf.read_file() op将给定文件的整个内容读入单个字符串,而tf.decode_csv() op则要求其输入的每个元素都是单个记录(即一行)。因此,您需要一次读取一行的内容,tf.TextLineReader支持。

使用阅读器比使用简单的op稍微复杂一些,因为它设计用于读取大型多文件数据集,并且在选择文件方面具有很大的灵活性。你可以看到tutorial 有关完整的解释,但以下示例代码应该有助于您入门:

.class

现在# Read the file once. filenames = tf.train.string_input_producer(["blah.txt"], num_epochs=1) reader = tf.TextLineReader() _, line = reader.read(filenames) col1, col2 = tf.decode_csv(line, record_defaults=[tf.constant([],dtype=tf.int32), tf.constant([],dtype=tf.int32)]) col1代表单个值。如果您评估它们,您将获得下一行的内容:

col2

如果您想要批量列,可以使用tf.train.batch()

# N.B. These must be called before evaluating the inputs.
sess.run(tf.initialize_all_variables()
tf.train.start_queue_runners(sess)

print sess.run([col1, col2])  # ==> 1, 2
print sess.run([col1, col2])  # ==> 3, 4