Tensorflow.dataset滑动窗口管道性能

时间:2019-05-23 14:08:55

标签: performance tensorflow time-series tensorflow-datasets sliding-window

我正在为时序分类问题实现tf.data输入管道;数据集本身是按时间顺序排列的记录,记录了50个要素的> 2M;对于任何给定时间pp.sign,预测模型都会提取特征function PPoint2(radius,sign,namep,fixval) { this.R = radius; this.S = sign; this.Namep = namep; this.Fixval = fixval; } //method for drawing each Point PPoint2.prototype.draw = function() { var pp = this; this.point = board.create('point', [function() { var K1 = sOmega.Value()*sOmega.Value()/g, KK = 1/4*sOmega.Value()*sOmega.Value()/g, v = sRadius.Value() * Math.PI * 0.5 / 10.0, c = [pp.S*pp.R*Math.sin(v), K1/2*pp.R*pp.R-KK+h0, pp.S*pp.R*Math.cos(v)]; return project(c, cam); }], { fixed: this.Fixval, name: this.Namep, visible: true }); }; //create and draw points var p3 = new PPoint2(0,-1,'p_3','false'); var I_1 = new PPoint2(r,1,'I_1','false'); p3.draw(); I_1.draw(); 的k个大小的窗口,以对t进行单个类别的预测。 训练进行了256次完整数据集滑动。

用于<500 000 num的集合。记录和窗口大小为128,明显的解决方案是准备大小为500000x128x50的冗余嵌入式矩阵,并在第一个暗点上进行随机采样。每一批;但是,它对实际数据大小没有帮助。

我已经弄清楚了这个流程管道:

t-k, t-k+1, ..., t

单个纪元遍历一小套100K记录:

t+1

测试运行显示,在我的桌面上迭代单个历时大约需要50秒,这比在预嵌入数组上迭代的速度慢了大约6倍;

我很感谢评论和改进此管道的建议;也许我的整个窗口整形图方法存在缺陷,并且有有效的替代方法可以获得相同的结果?

更新:按如下所示更改部分代码可使速度提高约20%(实际上无法弄清楚原因):

import tensorflow as tf
import numpy as np
import time

tf.reset_default_graph()

time_embed = 128
batch_size = 512
num_features = 50
data_size = 100000

shuffle_buffer_size = batch_size * 100

# Input placeholders (ingest numpy arrays):
input_data = tf.placeholder(tf.float32, shape=[None, num_features])
labels = tf.placeholder(tf.float32, shape=[None,])

input_tensors = {'x': input_data, 'y': labels}

# Dictionary of datasets:
ds_struct = {key: tf.data.Dataset.from_tensor_slices(tz) for key, tz in input_tensors.items()}


# Make time embedding windows:
ds_windowed = {
    key: ds.window(time_embed, 1, 1, drop_remainder=True).flat_map(lambda x: x.batch(time_embed))
    for key, ds in ds_struct.items()
}

# Zip into single dataset:
ds = tf.data.Dataset.zip(ds_windowed)

# Note that shapes of tensors are set to be dynamic except last dimension:
print('structured embedded dataset:\n', ds)

ds = ds.shuffle(shuffle_buffer_size)
ds = ds.batch(batch_size)

# Make reinitialisable iterator to feed different data (train, eval, test) into input_tensors:
iterator = ds.make_initializable_iterator()

batch = iterator.get_next()
print('batch:\n', batch)

# For many-to-one prediction discard all but last embedded targets,
# We also need to explicitly reshape x to get static shape sufficient to pass to some estimator:
truncated_batch = {
    'y': batch['y'][:, -1],
    'x': tf.reshape(batch['x'],[-1, time_embed, batch['x'].shape[-1]])
}

print('final batch:\n', truncated_batch)

0 个答案:

没有答案