我的.csv文件包含整数值,可以有NA值表示缺少数据。
示例文件:
-9882,-9585,-9179
-9883,-9587,NA
-9882,-9585,-9179
尝试使用
阅读时import tensorflow as tf
reader = tf.TextLineReader(skip_header_lines=1)
key, value = reader.read_up_to(filename_queue, 1)
record_defaults = [[0], [0], [0]]
data, ABL_E, ABL_N = tf.decode_csv(value, record_defaults=record_defaults)
稍后在第二次迭代的sess.run(_)
上抛出以下错误
InvalidArgumentError (see above for traceback): Field 5 in record 32400 is not a valid int32: NA
在TensorFlow中读取csv为NaN或类似值时,有没有办法解释字符串“NA”?
答案 0 :(得分:0)
我最近遇到了同样的问题。我通过将CSV作为字符串读取来解决它,用一些有效值替换每次出现的“NA”,然后将其转换为float
# Set up reading from CSV files
filename_queue = tf.train.string_input_producer([filename])
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
NUM_COLUMNS = XX # Specify number of expected columns
# Read values as string, set "NA" for missing values.
record_defaults = [[tf.cast("NA", tf.string)]] * NUM_COLUMNS
decoded = tf.decode_csv(value, record_defaults=record_defaults, field_delim="\t")
# Replace every occurrence of "NA" with "-1"
no_nan = tf.where(tf.equal(decoded, "NA"), ["-1"]*NUM_COLUMNS, decoded)
# Convert to float, combine to a single tensor with stack.
float_row = tf.stack(tf.string_to_number(no_nan, tf.float32))
但是长期来看,我计划切换到tfrecords,因为阅读csv对我的需求来说太慢了