我一直被这个问题困扰,所以希望有人能在这个问题上发光一点...
问题:对于这个模型,我尝试了tf.Data输入管道(第一次),是通过训练一个小模型(一个3层NN)来完成的。问题是,当训练开始时,在TaskManager上它会显示CPU(Ryzen 1700)的恒定负载为70-80%,而GPU(GTX 1080)的使用率仅为1-5%,即使我指定模型在GPU中也是如此。
我知道模型已上传到GPU,因为在加载模型时GPU的内存使用量会上升。
据我所知,tf.Data在CPU上执行了所有预处理功能,因此我认为这是tf.Data API的一部分被滥用,从而导致此性能问题。
另外,据我所知,在读取设备放置日志时,模型和优化器已分配在GPU中,而tf.Data操作则存储在CPU中。
这是定义tf.Data管道和模型本身的代码。它读取csv格式的数据,应用地图功能,然后拆分数据集,一个用于训练,另一个用于测试。
def batchFormatting(key, fare_amount, pickup_date, pick_long, pick_lat, drop_long, drop_lat, pass_count):
'''
This function parses the raw CSV and extract the date data from it
'''
parsed_date = tf.string_to_number(tf.substr(pickup_date,
pos = date_data_indexes,
len = date_data_lenght))
hour = parsed_date[:][0]
day = parsed_date[:][1]
month = parsed_date[:][2]
batch = tf.stack([ hour, day, month, pick_long,
pick_lat, drop_long,
drop_lat, tf.to_float(pass_count) ])
return batch, fare_amount
dataset = tf.contrib.data.CsvDataset(PARAMS['TRAIN_DIRECTION'], default_data, header = True).take(10000)
dataset = dataset.shuffle(20)
dataset = dataset.map(batchFormatting, num_parallel_calls = PARAMS['PARALLEL_NUM'])
# Split the test and train datsets
test_dataset = dataset.take(PARAMS['TEST_BATCH_SIZE'])
train_dataset = dataset.skip(PARAMS['TEST_BATCH_SIZE'])
# Configure the batches and the epochs, and then allocate them to the GPU
train_dataset = train_dataset.batch(PARAMS['TRAIN_BATCH_SIZE'])
train_dataset = train_dataset.repeat(PARAMS['EPOCHS'])
train_dataset = train_dataset.apply(tf.contrib.data.prefetch_to_device(PARAMS['TRANING_DEVICE']))
test_dataset = test_dataset.batch(PARAMS['TEST_BATCH_SIZE'])
test_dataset = test_dataset.repeat()
test_dataset = test_dataset.apply(tf.contrib.data.prefetch_to_device(PARAMS['TRANING_DEVICE']))
train_iterator = train_dataset.make_initializable_iterator()
train_batch, train_labels = train_iterator.get_next()
test_iterator = test_dataset.make_initializable_iterator()
test_batch, test_labels = test_iterator.get_next()
# Definition of the Model (3 layer neural network)
with tf.device(PARAMS['TRANING_DEVICE']):
with tf.name_scope('Weights'):
w1 = tf.Variable(tf.random_normal([8, 8], mean=1, stddev=0.5, name='Weight1', dtype=tf.float32))
w2 = tf.Variable(tf.random_normal([8, 13], mean=1, stddev=0.5, name='Weight2', dtype=tf.float32))
w3 = tf.Variable(tf.random_normal([13, 1], mean=1, stddev=0.5, name='Weight3', dtype=tf.float32))
with tf.name_scope('Biases'):
b1 = tf.Variable(tf.random_normal([8], mean=0, stddev=0.5, name='Bias1', dtype=tf.float32))
b2 = tf.Variable(tf.random_normal([13], mean=0, stddev=0.5, name='Bias2', dtype=tf.float32))
b3 = tf.Variable(tf.random_normal([1], mean=0, stddev=0.5, name='Bias3', dtype=tf.float32))
with tf.name_scope('Model'):
l1 = tf.sigmoid(tf.matmul(train_batch, w1) + b1, name='Layer1')
l2 = tf.sigmoid(tf.matmul(l1, w2) + b2, name='Layer2')
prediction = tf.sigmoid(tf.matmul(l2, w3) + b3, name='Layer3PREDICTION')
with tf.name_scope('Tesing'):
l1_test = tf.sigmoid(tf.matmul(test_batch, w1) + b1, name='Layer1Test')
l2_test = tf.sigmoid(tf.matmul(l1_test, w2) + b2, name='Layer2Test')
test_prediction = tf.sigmoid(tf.matmul(l2_test, w3) + b3, name='Layer3PREDICTIONTest')
with tf.name_scope('ErrorFunc'):
error_avg = tf.sqrt(tf.reduce_mean(tf.square(prediction - train_labels)))
test_error = tf.sqrt(tf.reduce_mean(tf.square(test_prediction - test_labels)))
with tf.name_scope('Optimization'):
optimizer = tf.train.AdamOptimizer(PARAMS['LEARNING_RATE']).minimize(error_avg)
此外,请随时指出您在代码中发现的任何其他错误/不良做法!我在这里学习