我有一个问题,我在计时每个批次要花多长时间。当它在CPU上运行时,运行迭代所需的时间最初是GPU的10倍,但它是一致的,但是当我在GPU上运行时,时间却不断增加。这是示例代码:
def feedward_model(graph, input_x_placeholder):
with graph.as_default():
# nn_layer takes input nodes and output y
y = nn_layer(input_x_placeholder)
return y
def train_model(graph, y_true_placeholder, predicted_y):
with graph.as_default():
cross_entropy = tf.losses.sparse_softmax_cross_entropy(labels=y_true_placeholder, logits=predicted_y)
train_step = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy)
return train_step, cross_entropy
def timeit(f):
@wraps(f)
def wrap(*args, **kw):
ts = time()
result = f(*args, **kw)
te = time()
delta = te-ts
print('func:%r took: %2.4f sec' %(f.__name__, delta))
return result
return wrap
g = tf.Graph()
with g.as_default():
# placeholder block
train_x = tf.placeholder(tf.float32)
train_y = tf.placeholder(tf.int32)
def feed_dict():
# data batching block, x and y will be generated in batches:
x = np.array()
y = np.array()
return {train_x: x, train_y: y}
with tf.Session(graph=g, config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)) as sess:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
with graph.as_default():
pred_y = feedforward_model(graph, train_x)
train_step, cross_entropy = train_model(graph, train_y, pred_y)
@timeit
def train_run():
_, loss = session.run([train_step, cross_entropy], feed_dict=feed_dict(), options=run_options, run_metadata=run_metadata)
for i in range(1000):
train_run()
我删除了必须与CPU通信的所有可能阶段,即日志记录tf.summary。我想不出任何原因,同一张图与CPU一致,但是运行时在GPU上开始增加了约200次迭代。
我发现了问题..我没有删除的最终日志是
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
现在我从session.run
中删除了这两个,在GPU上运行每个迭代的时间已经稳定。我想每次调用session.run时,FULL_TRACE
会变长,因此时间会线性增加。但是,为什么GPU运行时间只会开始增加200左右?以及为什么它不影响CPU上的运行?