我想在一些时间序列数据上尝试SKFLOW递归神经网络,其中包含二进制分类问题的实际值。我的每一行数据包含57个特征(变量),我想查看前两个样本和接下来的两个样本,以便对每一行进行预测。
我的数据如下:
样本-2:f1,f2,f3,f4,... f57,样本-1:f1,f2,f3,f4,... f57,当前样本:f1,f2,f3,f4,.. .f57,样本+1:f1,f2,f3,f4,... f57,样本+2:f1,f2,f3,f4,... f57
我从SKFLOW example RNN for text classification开始。
MAX_DOCUMENT_LENGTH = 10
vocab_processor = skflow.preprocessing.VocabularyProcessor(MAX_DOCUMENT_LENGTH)
X_train = np.array(list(vocab_processor.fit_transform(X_train)))
X_test = np.array(list(vocab_processor.transform(X_test)))
n_words = len(vocab_processor.vocabulary_)
print('Total words: %d' % n_words)
### Models
EMBEDDING_SIZE = 50
# Customized function to transform batched X into embeddings
def input_op_fn(X):
# Convert indexes of words into embeddings.
# This creates embeddings matrix of [n_words, EMBEDDING_SIZE] and then
# maps word indexes of the sequence into [batch_size, sequence_length,
# EMBEDDING_SIZE].
word_vectors = skflow.ops.categorical_variable(X, n_classes=n_words,
embedding_size=EMBEDDING_SIZE, name='words')
# Split into list of embedding per word, while removing doc length dim.
# word_list results to be a list of tensors [batch_size, EMBEDDING_SIZE].
word_list = skflow.ops.split_squeeze(1, MAX_DOCUMENT_LENGTH, word_vectors)
return word_list
# Single direction GRU with a single layer
classifier = skflow.TensorFlowRNNClassifier(rnn_size=EMBEDDING_SIZE,
n_classes=15, cell_type='gru', input_op_fn=input_op_fn,
num_layers=1, bidirectional=False, sequence_length=None,
steps=1000, optimizer='Adam', learning_rate=0.01, continue_training=True)
看起来我应该只能修改input_op_fn以使其工作但我不确定如何正确地将我的numpy数组转换为skflow.TensorFlowRNNClassifier的张量。这就是文本分类示例的样子。
>>> word_vectors.get_shape()
TensorShape([Dimension(560000), Dimension(10), Dimension(50)])
>>> len(word_list)
10
如果我正确地解释了文本问题,那么对于我的问题就是这样 TensorShape([尺寸(#行),尺寸(57),尺寸(3)])
答案 0 :(得分:2)
查看此unit test以获取RNN。
假设这是数字数据:
data = np.array(list([[2, 1, 2, 2, 3],
[2, 2, 3, 4, 5],
[3, 3, 1, 2, 1],
[2, 4, 5, 4, 1]]), dtype=np.float32)
labels = np.array(list([1, 0, 1, 0]), dtype=np.float32)
data
的形状为(4, 5)
,其中4为batch_size,5为sequence_length。然后,您希望tf.split(1, 5, X)
中有input_op_fn()
。希望这可以帮助。欢迎您提交PR以添加处理此问题的示例。