Question

我尝试解析包含日期字符串的CSV文件（格式为“2018-03-30 09:30:05”）。

应该以天/小时/分钟/秒的形式将其转换为单热编码功能。

一种显而易见的方法是使用pandas并将其存储在单独的文件或HDF存储中。

但是为了简化工作流程（并利用GPU），我想直接在TensorFlow中这样做。

假设日期字符串位于-2位置，我认为像tf.int32(tf.substr(row[-2],0,4))这样的东西应该可以获得年份，但它会返回TypeError: 'DType' object is not callable。

with tf.python_io.TFRecordWriter("train_sample_sorted.tfrecords") as tf_writer:
i = 0
for row in myArray:
    i +=1
    if(i%10000==0):
        print(row[-2])
    #timefeatures = int(row[-2][0:4]) ## TypeError: Value must be iterable
    #timefeatures = tf.int32(tf.substr(row[-2],0,4)) ## TypeError: 'DType' object is not callable
    features, label = row[:-2], row[-1]
    example = tf.train.Example()
    example.features.feature["features"].float_list.value.extend(features)
    example.features.feature["timefeatures"].float_list.value.extend(timefeatures)
    example.features.feature["label"].int64_list.value.append(label)
    tf_writer.write(example.SerializeToString())

将日期字符串作为输入要素处理的最佳做法是什么？有没有办法预处理？

由于

Answer 1

第一个版本int( row[ -2 ][ 0 : 4 ] )失败有两个原因：一个是索引不能用于字符串张量的字符串，如果它没有失败，那么它会失败，因为你无法将其转换为int。

第二个版本tf.int32( tf.substr( row[ -2 ], 0, 4 ) )几乎就在那里，它将字符串拆分很好，但要将字符串转换为数字，您必须使用tf.string_to_number，您不能简单地将字符串转换为具有张量的数字

如果无法访问您使用的数据，我无法对其进行测试，但这应该可行：

tf.string_to_number( tf.substr( row[ -2 ], 0, 4 ), out_type = tf.int32 )

从TensorFlow中的日期字符串派生特征

1 个答案: