Question

我正在使用TensorFlow 0.10.0rc0。我在Ubuntu 14.04上有CUDA驱动程序版本= 7.5和CUDNN 4。

我有一个简单的CSV文件，它有一行如下：

"field with
newline",0

通过在Ubuntu上按下VIM中的回车键添加换行符。我可以使用pandas函数在read_csv中阅读此文件，其中文本字段显示为包含单个\n字符。

但是当我尝试在TensorFlow中读取它时，我收到以下错误：

tensorflow.python.framework.errors.InvalidArgumentError: Quoted field has to end with quote followed by delim or end

我读取CSV的张量流代码使用此函数读取单行：

def read_single_example(filename_queue, skip_header_lines, record_defaults, feature_index, label_index):
    reader = tf.TextLineReader(skip_header_lines=skip_header_lines)
    key, value = reader.read(filename_queue)
    record = tf.decode_csv(
        value,
        record_defaults=record_defaults)
    features, label = record[feature_index], record[label_index]
    return features, label

如果我使用pandas阅读并用空格替换所有换行符，则TensorFlow代码能够成功解析CSV。

但如果可以在TensorFlow CSV管道中处理换行，那将非常有用。

Answer 1

这里的问题是TextLineReader在csv解码器解析文件之前将文件拆分为新行。使用tf.data，您可以使用tf.contrib.data.CsvDataset，它会根据RFC4180正确解析此文件。

Answer 2

根据我的经验，TensorFlow的CSV阅读器非常严格，与RFC4180有关。

确保您的文件在每行末尾以及引用字段中使用CRLF，应该允许处理。

注意：到目前为止，我一直在使用这个。我没有尝试从0.10开始的RC。

Tensorflow CSV解码错误

2 个答案: