Tensorflow无法将字符串转换为数字

时间:2019-02-18 11:50:12

标签: python tensorflow

我正在使用tensorflow-1.12,并且当我通过tf.data.Dataset从csv加载数据时,无法将单元格值从字符串转换为数字。我的csv看起来像:

"string_col1","col1","col2", ...
"some value","23.502482","53.5", ...

我只想使用带有数字(col1,col2等)的列作为输入,所以我有删除第一列的功能:

import tensorflow as tf

def slice_and_transform_to_float(line):
    line_splitted = tf.string_split([line], ",")
    str_data = tf.convert_to_tensor(line_splitted.values, dtype=tf.string)
    str_data = tf.slice(str_data, [1], [col_size])
    return tf.string_to_number(str_data, out_type=tf.float32) # here is a problem


map_func = lambda line: slice_and_transform_to_float(line)
dataset = tf.data.Dataset.from_tensor_slices(train_files)
dataset = dataset.map(map_func, num_parallel_calls=4)
iterator = dataset.make_initializable_iterator()

sess = tf.Session()
iterator = dataset.make_initializable_iterator()
sess.run([tf.global_variables_initializer(), iterator.initializer])
next_iter = iterator.get_next()
next_rows = sess.run(next_iter) # here we have exception


当我尝试运行它时,出现错误:

tensorflow.python.framework.errors_impl.InvalidArgumentError: StringToNumberOp could not correctly convert string: "23.502482"
     [[{{node StringToNumber}} = StringToNumber[out_type=DT_FLOAT](Slice)]]
     [[node IteratorGetNext (defined at script.py:100)  = IteratorGetNext[output_shapes=[[?,8]], output_types=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorV2)]]

似乎我有一个字符串值,它是一个数字,但是tensorflow在转换为float时存在问题。我尝试使用整数值和tf.float64,但没有任何变化。你知道什么可能是错误的吗?

我试图找到类似的问题,但是当有人想要将“字符串”转换为数字时,我才发现问题。

2 个答案:

答案 0 :(得分:2)

问题是您正在传递带有周围引号的数字字符串,不能将其解析为数字。例如,您可以删除引号:

import tensorflow as tf

def slice_and_transform_to_float(line):
    line_splitted = tf.strings.split([line], ",")
    str_data = tf.convert_to_tensor(line_splitted.values, dtype=tf.string)
    str_data = tf.slice(str_data, [1], [2])  # Fixed that to 2 for the example
    str_len = tf.strings.length(str_data)
    str_unquoted = tf.strings.substr(str_data, tf.ones_like(str_len), str_len - 2)
    return tf.strings.to_number(str_unquoted, out_type=tf.float32)

with tf.Graph().as_default(), tf.Session() as sess:
    print(sess.run(slice_and_transform_to_float('"some value","23.502482","53.5"')))
    # [23.502481 53.5     ]

答案 1 :(得分:0)

因为有时csv文件中的行可以带有或不带有引号,例如:

"col1", "col2", "col3", ...
23.582, "53.5",  12   , ...

我已通过以下方式更改了您的解决方案:

def slice_and_transform_to_float(line):
    line_splitted = tf.string_split([line], ",")
    str_data = tf.convert_to_tensor(line_splitted.values, dtype=tf.string)
    str_data = tf.slice(str_data, [0], [2]) # Fixed that to 2 for the example
    str_data = tf.map_fn(lambda x: tf.regex_replace(x, '"', ""), str_data)
    return tf.string_to_number(str_data, out_type=out_type)

现在,值是否包含引号都无关紧要。