Question

如何使用tf.TextLineReader（）和tf.decode_csv解码带有长行的http://google.com/abcd/efg (captures "abcd/efg", but I want "/abcd/efg") http://google.com/abcd (captures "abcd", but I want "/abcd") http://google.com/ (Fail) http://google.com (Fail) /abcd (captures "abcd", but I want "/abcd") / (Fail)文件（例如，每行有多个项目，以便逐个列出它们以便输出）？

典型用法是：

csv

当我们在一行中有数千个项目时，无法将它们逐个分配为上面的（a，b，c，d，e），可以将所有项目解码为列表或其他内容那样的？

Answer 1

假设您有1800列数据。您可以将其用作记录默认值：

record_defaults=[[1]]*1800

然后使用

all_columns = tf.decode_csv(value, record_defaults=record_defaults)

阅读它们。

Answer 2

好吧，tf.decode_csv会返回一个列表，所以您只需执行以下操作：

record_defaults = [[1], [1], [1], [1], [1]]
all_columns = tf.decode_csv(value, record_defaults=record_defaults)
all_columns
Out: [<tf.Tensor 'DecodeCSV:0' shape=() dtype=int32>,
 <tf.Tensor 'DecodeCSV:1' shape=() dtype=int32>,
 <tf.Tensor 'DecodeCSV:2' shape=() dtype=int32>,
 <tf.Tensor 'DecodeCSV:3' shape=() dtype=int32>,
 <tf.Tensor 'DecodeCSV:4' shape=() dtype=int32>
]

然后您可以照常评估它：

sess = tf.Session() 
sess.run(all_columns)
Out: [1, 1, 1, 1, 1]

请注意，您需要传递等级1 record_defaults。如果挂队有问题。

Answer 3

这是我在record_defaults中混合不同dtypes的方式：

record_defaults = [tf.constant(.1, dtype=tf.float32) for count in range(100)] # 5 fp32 features
record_defaults.extend([tf.constant(1, dtype=tf.int32) for count in range(2)]) # 2 int32 features

如何使用tf.decode_csv解码张量流中带有长行的csv文件？

3 个答案: