Tensorflow:为什么我的代码运行得越来越慢?

时间:2017-11-28 09:48:06

标签: python performance tensorflow

我是tensorflow的新手。以下代码可以成功运行,没有任何错误。在前10行输出中,计算速度很快,输出(在最后一行中定义)逐行飞行。然而,随着迭代的增加,计算变得越来越慢,最终变得无法容忍。所以我想知道是否有任何可以加快这一点的修改。

以下是此代码的简要说明: 此代码将单个隐藏层神经网络应用于数据集。它旨在找到速率[0]和速率[1]的最佳参数,这些参数将影响损失函数。在训练的每个步骤中,将一个元组馈送到模型,并立即评估元组的准确性(这种数据在现实世界中作为流传递)。

import tensorflow as tf
import numpy as np

n_hidden=50
n_input=37
n_output=2
data_raw=np.genfromtxt(r'data.csv',delimiter=",",dtype=None)
data_info=np.genfromtxt(r'data2.csv',delimiter=",",dtype=None)

def pre_process( tuple):
    ans = []
    temp = [0 for i in range(24)]
    temp[int(tuple[0])] = 1
    # np.append(ans,np.array(temp))
    ans.extend(temp)
    temp = [0 for i in range(7)]
    temp[int(tuple[1]) - 1] = 1
    ans.extend(temp)
    # np.append(ans,np.array(temp))
    temp = [0 for i in range(3)]
    temp[int(tuple[3])] = 1
    ans.extend(temp)
    temp = [0 for i in range(2)]
    temp[int(tuple[4])] = 1
    ans.extend(temp)
    ans.extend([int(tuple[5])])
    return np.array(ans)

x=tf.placeholder(tf.float32, shape=[1,n_input])
y_=tf.placeholder(tf.float32,shape=[n_output])
y_r=tf.placeholder(tf.float32,shape=[n_output])
W1=tf.Variable(tf.random_uniform([n_input, n_hidden]))
b1=tf.Variable(tf.zeros([n_hidden]))
W2=tf.Variable(tf.zeros([n_hidden,n_output]))
b2=tf.Variable(tf.zeros([n_output]))

logits_1 = tf.matmul(x, W1) + b1
relu_layer= tf.nn.relu(logits_1)
logits_2 = tf.matmul(relu_layer, W2) + b2

correct_prediction = tf.equal(tf.argmax(logits_2,1), tf.argmax(y_,0))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

rate=[0,0]
for i in range(-100,200,10):
    rate[0]=i;
    for j in range(-100,i,10):
        rate[1]=j
        loss=tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=logits_2)*[rate[0],rate[1]])
#       loss2=tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(labels=y_r, logits=logits_2)*[rate[2],rate[3]])
#       loss=loss1+loss2
        train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
        data_line=1

        accur=0
        local_local=0
        remote_remote=0
        local_remote=0
        remote_local=0
        total=0
        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            for i in range(200):
#               print(int(data_raw[data_line][0]),data_info[i][0])
                if i>100:
                    total+=1
                if int(data_raw[data_line][0])==data_info[i][0]:
                    sess.run(train_step,feed_dict={x:pre_process(data_info[i]).reshape(1,-1),y_:[1,0],y_r:[0,1]})
#                   print(sess.run(logits_2,{x:pre_process(data_info[i]).reshape(1,-1), y_: #[1,0]}))
                    data_line+=1;
                    if data_line==len(data_raw):
                        break
                    if i>100:
                        acc=accuracy.eval(feed_dict={x: pre_process(data_info[i]).reshape(1,-1), y_: [1,0], y_r:[0,1]})
                        local_local+=acc
                        local_remote+=1-acc
                        accur+=acc
                else:
                    sess.run(train_step,feed_dict={x:pre_process(data_info[i]).reshape(1,-1),y_:[0,1], y_r:[1,0]})
#                   print(sess.run(logits_2,{x: pre_process(data_info[i]).reshape(1,-1), y_: #[0,1]}))
                    if i>100:
                        acc=accuracy.eval(feed_dict={x: pre_process(data_info[i]).reshape(1,-1), y_: [0,1], y_r:[1,0]})
                        remote_remote+=acc
                        remote_local+=1-acc
                        accur+=acc

        print("correctness: (%.3d,%.3d): \t%.2f   %.2f   %.2f   %.2f   %.2f" % (rate[0],rate[1],accur/total,local_local/total,local_remote/total,remote_local/total,remote_remote/total))

2 个答案:

答案 0 :(得分:4)

虽然GPhilo的答案解决了为什么运行代码变得越来越慢的问题,但实际上,该解决方案将导致一次又一次地创建计算图,这是不好的。

以下两行代码(GPhilo也提到过)会在每次迭代时不断向图表添加操作。

loss=tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits( \
                    labels=y_, logits=logits_2)*[rate[0],rate[1]])
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

正如我所看到的,您有两个值rate[0], rate[1]需要提供给您的图表。为什么不通过placeholder提供这两个值,只定义一次图表。开始运行Session后,您不应在图表中添加更多操作。此外,您不应该考虑初始化Session以进行迭代。

检查此修改后的代码(仅限重要部分)

#  To clear previously created graph (if any) present in memory.
tf.reset_default_graph()   
x=tf.placeholder(tf.float32, shape=[1,n_input])
y_=tf.placeholder(tf.float32,shape=[n_output])
y_r=tf.placeholder(tf.float32,shape=[n_output])

# Add these two placeholders (Assuming they are single float value)
rate0 = tf.placeholder(tf.float32, shape = []) 
rate1 = tf.placeholder(tf.float32, shape = [])

W1=tf.Variable(tf.random_uniform([n_input, n_hidden]))
....
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Bring this code outside from loop (Note replacement of rate[0] with placeholder)
loss=tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(labels=y_, \
            logits=logits_2) * [rate0, rate1])
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

# Instantiate session only once.
with tf.Session() as sess:
     sess.run(tf.global_variables_initializer())

     # Move the subsequent looping code inside.
     rate=[0,0]
     for i in range(-100,200,10):
        rate[0]=i;

完成此修改后,只要您的Session运行train_step,就需要在feed_dict中提供这两个额外的占位符。

例如:

sess.run(train_step,feed_dict={x:pre_process(data_info[i]).reshape(1,-1),
         y_:[1,0],y_r:[0,1], rate0: rate[0], rate1: rate[1]})

通过这种方式,您不会为每次迭代创建图形,事实上这段代码将比GPhilo的解决方案更快。

答案 1 :(得分:2)

每次运行train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)时,您都会向图表中添加(相当多)一些操作,这些操作会随着程序循环的增加而变得越来越大。图表越大,执行速度越慢。

将您的模型定义放在循环中'每次开始新的迭代时,身体并调用tf.reset_default_graph()

rate=[0,0]
for i in range(-100,200,10):
    rate[0]=i;
    for j in range(-100,i,10):
        tf.reset_default_graph()
        x=tf.placeholder(tf.float32, shape=[1,n_input])
        y_=tf.placeholder(tf.float32,shape=[n_output])
        y_r=tf.placeholder(tf.float32,shape=[n_output])
        W1=tf.Variable(tf.random_uniform([n_input, n_hidden]))
        b1=tf.Variable(tf.zeros([n_hidden]))
        W2=tf.Variable(tf.zeros([n_hidden,n_output]))
        b2=tf.Variable(tf.zeros([n_output]))

        logits_1 = tf.matmul(x, W1) + b1
        relu_layer= tf.nn.relu(logits_1)
        logits_2 = tf.matmul(relu_layer, W2) + b2

        correct_prediction = tf.equal(tf.argmax(logits_2,1), tf.argmax(y_,0))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

        rate[1]=j
        #...