如何正确塑造RNN的时间序列数据?

时间:2019-03-10 08:49:05

标签: python arrays numpy tensorflow neural-network

我已经开始使用TensorFlow使用Python进行一个简单的项目,以通过循环网络预测股票市场价格。到目前为止,这是我的代码:

n_steps = 30
n_inputs = 1
n_neurons = 100
n_outputs = 1

X = tf.placeholder(tf.float32, [1, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
cell = tf.contrib.rnn.OutputProjectionWrapper(
    tf.contrib.rnn.BasicRNNCell(num_units=n_neurons, activation=tf.nn.relu),
    output_size = n_outputs
)
outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)

learning_rate = 0.001

loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()
n_iterations = numStocks
batch_size = 1

def priceArrayToRNNFormat(priceArray):
    list = []
    print(priceArray)
    for price in priceArray:
        list.append(price)
    return np.array(list)

with tf.Session() as sess:
    init.run()
    for iteration in range(n_iterations):
        dataOrig = [allStocksDict[list(allStocksDict.keys())[iteration]]]
        data = priceArrayToRNNFormat(dataOrig)
        print(data)
        X_batch = data
        y_batch = data
        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        if iteration % 100 == 0:
            mse = loss.eval(feed_dict={X: X_batch, y: y_batch})
            print(iteration, "\tMSE", mse)

作为参考,allStocksDict只是一个字典,其中每个键都是股票代号,值是其价格随时间推移的30个元素的数组。运行代码时,我得到以下输出:

[['14.9400', '15.0000', '14.8800', '14.6900', '14.6300', '15.0000', '14.9400', '15.1300', '15.5600', '15.3100', '15.3800', '14.6900', '15.0000', '15.1300', '14.6300', '14.0600', '14.1300', '14.9400', '14.4400', '13.6300', '13.0000', '12.3800', '12.5000', '12.6300', '13.0000', '12.6900', '13.1300', '13.1900', '13.0600', '12.9400']]
[['14.9400' '15.0000' '14.8800' '14.6900' '14.6300' '15.0000' '14.9400'
  '15.1300' '15.5600' '15.3100' '15.3800' '14.6900' '15.0000' '15.1300'
  '14.6300' '14.0600' '14.1300' '14.9400' '14.4400' '13.6300' '13.0000'
  '12.3800' '12.5000' '12.6300' '13.0000' '12.6900' '13.1300' '13.1900'
  '13.0600' '12.9400']]
Traceback (most recent call last):
  File "/home/john/Python/StockProject/monthlyRnn1.py", line 127, in <module>
    sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
  File "/home/john/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/john/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1128, in _run
    str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 30) for Tensor 'Placeholder:0', which has shape '(1, 30, 1)'

尽管此错误仍然存​​在,但我尝试自行提供列表而不将其转换为数组,也没有在将其转换为数组之前将其转换为向量。非常感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

一个可能的解决方案可能是

def priceArrayToRNNFormat(priceArray):
    #list = []
    #print(priceArray)
    #for price in priceArray:
    #    list.append(price)
    #return np.array(list)
    return np.reshape(np.asarray(priceArray, dtype=np.float32), (1, n_steps, n_inputs))

嵌套列表也是可以接受的,另一种选择是转置priceArray并将其重新包装为迷你批处理列表。
但是前一个选项np.reshape()简单快捷。