我已经调整了一个在线发现的Actor Critic网络,该网络学习了基于过去的股价数据和训练前已添加到数据框中的一些技术指标来交易股票期权。
在我的原始模型中,我使用了8个要素以及50个周期的回溯窗口,因此我的状态尺寸具有[-1,405]的最终数组,该数组被馈送到self.X占位符(另外5个是一对诸如当前贸易的权益等)。该模型的架构如下所示:
class Actor:
def __init__(self, name, input_size, output_size, size_layer):
with tf.variable_scope(name):
self.X = tf.placeholder(tf.float32, (None, input_size)) # [-1, 405] Represents the state size which is all your features * 50 period window size/lookback period
feed_actor = tf.layers.dense(self.X, size_layer, activation = tf.nn.relu) # Feed the above into a 256 layer
tensor_action, tensor_validation = tf.split(feed_actor,2,1)
feed_action = tf.layers.dense(tensor_action, output_size)
feed_validation = tf.layers.dense(tensor_validation, 1)
self.logits = feed_validation + tf.subtract(feed_action, tf.reduce_mean(feed_action,axis=1,keep_dims=True))
class Critic:
def __init__(self, name, input_size, output_size, size_layer, learning_rate):
with tf.variable_scope(name):
self.X = tf.placeholder(tf.float32, (None, input_size)) # [-1, 405]
self.Y = tf.placeholder(tf.float32, (None, output_size))
self.REWARD = tf.placeholder(tf.float32, (None, 1))
feed_critic = tf.layers.dense(self.X, size_layer, activation = tf.nn.relu)
tensor_action, tensor_validation = tf.split(feed_critic,2,1)
feed_action = tf.layers.dense(tensor_action, output_size)
feed_validation = tf.layers.dense(tensor_validation, 1)
feed_critic = feed_validation + tf.subtract(feed_action,tf.reduce_mean(feed_action,axis=1,keep_dims=True))
feed_critic = tf.nn.relu(feed_critic) + self.Y
feed_critic = tf.layers.dense(feed_critic, size_layer//2, activation = tf.nn.relu)
self.logits = tf.layers.dense(feed_critic, 1)
self.cost = tf.reduce_mean(tf.square(self.REWARD - self.logits))
self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost) # Default is Adam Optimizer to reduce cost function
class Agent:
LEARNING_RATE = 0.0000001
BATCH_SIZE = 32
LAYER_SIZE = 256
OUTPUT_SIZE = 5 # Buy Call, Sell Call, Hold, Buy Put, Sell Put
EPSILON = 0.5
DECAY_RATE = 0.005
MIN_EPSILON = 0.05
GAMMA = 0.95
MEMORIES = deque()
MEMORY_SIZE = 500
COPY = 1000
T_COPY = 0
...它学会了如何在许多不同的数据集之间进行很好的交易。从那时起,我向每个数据帧添加了更多的技术指标,并将回溯期增加到60个周期,所以我的新状态大小是[-1,2165]的数组,大约是原始模型的5倍。我现在发现的是,当前的模型在培训期间很难找到获利的机会,我认为这是因为问题变得更加复杂了。
我的问题是:
非常感谢!