Question

我一直在尝试使用tf.data.Dataset以有状态的方式批量处理和拆分时间序列数据，但是无法使其正常工作。这里是文档的链接：https://www.tensorflow.org/api_docs/python/tf/data/Dataset

这是我要实现的目标：

import tensorflow as tf
import numpy as np

假设我们有shape =（10,100,1）的数据，其中有10个数据样本，每个样本有100个时间样本和1个特征

x = np.reshape(np.arange(1000), (10, 100, 1))
print(x.shape)
print(x[0, :, 0])
print(x[1, :, 0])

出局：

(10, 100, 1)
[000 001 ... 098 099]
[100 101 ... 198 199]

其中范围0-99、100-199 ...，900-999具有相同的时间序列，但例如150和250无关。

现在，我们要训练RNN / GRU / LSTM，其中批次为（2，25，1）。这意味着我们必须将样本维度拆分为5，将时间维度拆分为4。

x = np.split(x, 5, axis=0)
print(len(x), x[0].shape)

出局：

5 (2, 100, 1)

和...

x = [np.split(item, 4, axis=1) for item in x]
print(len(x),len(x[0]), x[0][0].shape)

出局：

5 4 (2, 25, 1)

现在第一批应该是：

    [  0   1 ...  23  24]
    [100 101 ... 123 124]

第二批应该是：

    [ 25  26 ...  48  49]
    [125 126 ... 148 149]

因为这将确保训练是有状态的。

以下...

print(np.squeeze(x[0][0]))
print(np.squeeze(x[0][1]))

出局：

[[  0   1 ...  23  24]
 [100 101 ... 123 124]]
[[ 25  26 ...  48  49]
 [125 126 ... 148 149]]

可以通过

实现

x = np.concatenate([np.concatenate(item, axis=0) for item in x], axis=0)
dataset = tf.data.Dataset.from_tensor_slices(x)
final_dataset = dataset.batch(2)

如此

for x in final_dataset:
    print(np.squeeze(x.numpy()))
    print('###')

出局：

[[  0   1 ...  23  24]
 [100 101 ... 123 124]]
###
[[ 25  26 ...  48  49]
 [125 126 ... 148 149]]
###
[[ 50  51 ...  73  74]
 [150 151 ... 173 174]]
###
[[ 75  76 ...  98  99]
 [175 176 ... 198 199]]
###
[[200 201 ... 223 224]
 [300 301 ... 323 324]]
###
...

在199之后的第一个“状态”结束。

我的问题是：有没有办法做同样的事情，但是从这里开始：

dataset = tf.data.Dataset.from_tensor_slices(np.reshape(np.arange(1000), (10, 100, 1)))

进一步：

如果数据的形状为=（13,100,1），那么轴= 0为质数，而tf / np.split将不起作用。

如果数据的形状是=（10,101,1），那么轴= 1是质数，则tf / np.split将不起作用。

使用tensorflow数据API将时间序列数据拆分为有状态数据集

0 个答案: