读入文本并转换为整数数组

时间:2018-08-10 23:16:22

标签: tensorflow

我有一个csv文件,其中每一行都是

"0,0,0,0,0,1,0,0,0,20,0,17,0,0"

我尝试使用此功能读取数据

    def decode_csv(line):
        line_split = tf.string_split([line], ',')
        features= tf.string_to_number(line_split.values[:-1], tf.int32)
        label= tf.string_to_number(line_split.values[-1], tf.int32)
        return features, label

    dataset =tf.data.TextLineDataset("Documents/t1.csv").skip(1).map(decode_csv3)
    dataset=dataset.shuffle(buffer_size=2).repeat(-1).batch(2)
    dataset_init=dataset.make_initializable_iterator()
    x,y= dataset_init.get_next()

我想将每一行都转换为表格

    [0,0,0,0,0,1,0,0,0,20,0,17,0]

对于x

[0]

y

我收到错误消息

 invalidArgumentError (see above for traceback): StringToNumberOp could not correctly convert string: "0
 [[Node: StringToNumber = StringToNumber[out_type=DT_FLOAT](strided_slice)]]
 [[Node: IteratorGetNext_25 = IteratorGetNext[output_shapes=[[?,?], [?]], output_types=[DT_FLOAT, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"](Iterator_25)]]

2 个答案:

答案 0 :(得分:1)

您似乎需要删除字符串中的双引号。

尝试一下:

def decode_csv(line):
    line = line.strip('\"')
    line_split = tf.string_split([line], ',')
    features= tf.string_to_number(line_split.values[:-1], tf.int32)
    label= tf.string_to_number(line_split.values[-1], tf.int32)
    return features, label

dataset =tf.data.TextLineDataset("Documents/t1.csv").skip(1).map(decode_csv3)
dataset=dataset.shuffle(buffer_size=2).repeat(-1).batch(2)
dataset_init=dataset.make_initializable_iterator()
x,y= dataset_init.get_next()

答案 1 :(得分:0)

使用想法表@agillgilla,我可以使用它

def decode_csv(line):
    line = tf.py_func(lambda x: x.decode("utf-8").strip('"'), [line], tf.string)
    line_split = tf.string_split([line], ',')
    features= tf.string_to_number(line_split.values[:-1], tf.int32)
    label= tf.string_to_number(line_split.values[-1], tf.int32)
    return features, label

dataset =tf.data.TextLineDataset("Documents/t1.csv").skip(1).map(decode_csv3)
dataset=dataset.shuffle(buffer_size=2).repeat(-1).batch(2)
dataset_init=dataset.make_initializable_iterator()
x,y= dataset_init.get_next()