我正在为时序分类问题实现tf.data输入管道;数据集本身是按时间顺序排列的记录,记录了50个要素的> 2M;对于任何给定时间pp.sign
,预测模型都会提取特征function PPoint2(radius,sign,namep,fixval) {
this.R = radius;
this.S = sign;
this.Namep = namep;
this.Fixval = fixval;
}
//method for drawing each Point
PPoint2.prototype.draw = function() {
var pp = this;
this.point = board.create('point', [function() {
var K1 = sOmega.Value()*sOmega.Value()/g,
KK = 1/4*sOmega.Value()*sOmega.Value()/g,
v = sRadius.Value() * Math.PI * 0.5 / 10.0,
c = [pp.S*pp.R*Math.sin(v),
K1/2*pp.R*pp.R-KK+h0,
pp.S*pp.R*Math.cos(v)];
return project(c, cam);
}], {
fixed: this.Fixval,
name: this.Namep,
visible: true
});
};
//create and draw points
var p3 = new PPoint2(0,-1,'p_3','false');
var I_1 = new PPoint2(r,1,'I_1','false');
p3.draw();
I_1.draw();
的k个大小的窗口,以对t
进行单个类别的预测。
训练进行了256次完整数据集滑动。
用于<500 000 num的集合。记录和窗口大小为128,明显的解决方案是准备大小为500000x128x50的冗余嵌入式矩阵,并在第一个暗点上进行随机采样。每一批;但是,它对实际数据大小没有帮助。
我已经弄清楚了这个流程管道:
t-k, t-k+1, ..., t
单个纪元遍历一小套100K记录:
t+1
测试运行显示,在我的桌面上迭代单个历时大约需要50秒,这比在预嵌入数组上迭代的速度慢了大约6倍;
我很感谢评论和改进此管道的建议;也许我的整个窗口整形图方法存在缺陷,并且有有效的替代方法可以获得相同的结果?
更新:按如下所示更改部分代码可使速度提高约20%(实际上无法弄清楚原因):
import tensorflow as tf
import numpy as np
import time
tf.reset_default_graph()
time_embed = 128
batch_size = 512
num_features = 50
data_size = 100000
shuffle_buffer_size = batch_size * 100
# Input placeholders (ingest numpy arrays):
input_data = tf.placeholder(tf.float32, shape=[None, num_features])
labels = tf.placeholder(tf.float32, shape=[None,])
input_tensors = {'x': input_data, 'y': labels}
# Dictionary of datasets:
ds_struct = {key: tf.data.Dataset.from_tensor_slices(tz) for key, tz in input_tensors.items()}
# Make time embedding windows:
ds_windowed = {
key: ds.window(time_embed, 1, 1, drop_remainder=True).flat_map(lambda x: x.batch(time_embed))
for key, ds in ds_struct.items()
}
# Zip into single dataset:
ds = tf.data.Dataset.zip(ds_windowed)
# Note that shapes of tensors are set to be dynamic except last dimension:
print('structured embedded dataset:\n', ds)
ds = ds.shuffle(shuffle_buffer_size)
ds = ds.batch(batch_size)
# Make reinitialisable iterator to feed different data (train, eval, test) into input_tensors:
iterator = ds.make_initializable_iterator()
batch = iterator.get_next()
print('batch:\n', batch)
# For many-to-one prediction discard all but last embedded targets,
# We also need to explicitly reshape x to get static shape sufficient to pass to some estimator:
truncated_batch = {
'y': batch['y'][:, -1],
'x': tf.reshape(batch['x'],[-1, time_embed, batch['x'].shape[-1]])
}
print('final batch:\n', truncated_batch)