我们有tensorflow应用程序,我们通过250个批量的队列提供数据。 在使用VarLenFeature(而不是FixedLenFeature)之后,我们开始在训练期间内存泄漏,其中内存不断增加。我们正在使用GPU机器训练我们的模型。
这是解码代码:
@staticmethod
def decode(serialized_example):
features = tf.parse_example(
serialized_example,
# Defaults are not specified since both keys are required.
features={
# target_features
RECS: tf.VarLenFeature(tf.float32),
CLICK: tf.FixedLenFeature([], tf.float32)
})
return features
然后我们使用:
将稀疏转换为密集tf.identity(tf.sparse_tensor_to_dense(tensor), name=key)
然后我们使用批处理的tensorflow队列进行循环
这是创建队列代码:
@staticmethod
def create_queue(tensors, capacity, shuffle=False, min_after_dequeue=None, seed=None,
enqueue_many=False, shapes=None, shared_name=None, name=None):
tensor_list = _as_tensor_list(tensors)
with ops.name_scope(name, "shuffle_batch_queue", list(tensor_list)):
tensor_list = _validate(tensor_list)
tensor_list, sparse_info = _store_sparse_tensors(
tensor_list, enqueue_many, tf.constant(True))
map_op = [x.map_op for x in sparse_info]
types = _dtypes([tensor_list])
shapes = _shapes([tensor_list], shapes, enqueue_many)
queue = data_flow_ops.RandomShuffleQueue(
capacity=capacity, min_after_dequeue=min_after_dequeue, seed=seed,
dtypes=types, shapes=shapes, shared_name=shared_name)
return queue, sparse_info, map_op
入队行动是:
@staticmethod
def enqueue(queue, tensors, num_threads, enqueue_many=False, name=None, map_op = None):
tensor_list = _as_tensor_list(tensors)
with ops.name_scope(name, "shuffle_batch_equeue", list(tensor_list)):
tensor_list = _validate(tensor_list)
tensor_list, sparse_info = _store_sparse_tensors(
tensor_list, enqueue_many, tf.constant(True), map_op)
_enqueue(queue, tensor_list, num_threads, enqueue_many, tf.constant(True))
return queue, sparse_info
答案 0 :(得分:1)
你能提供一个最小的例子吗?例如,如果您只是通过多个session.run调用反复调用示例解析,并且没有任何队列,那么您是否继续发生内存泄漏?
我问的原因是mysql> SELECT
-> a.species,
-> a.date
-> a.qty_taken,
-> (SELECT SUM(b.qty_taken) / COUNT(b.qty_taken)
-> FROM global AS b
-> WHERE species = a.species
-> AND TIMESTAMPDIFF(YEAR, a.date, b.date) BETWEEN 0 AND 4
-> )AS '5_yr_avg'
-> FROM global AS a
-> WHERE a.species IS NOT NULL
-> ORDER BY a.species, a.date DESC;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 27774465
Current database: j2_possbil
因某个原因而隐藏在该文件中;如果你误用它,你将发生内存泄漏。因此,此功能的所有呼叫者必须非常小心地正确使用它。对于通过_store_sparse_tensors
存储的每个稀疏张量,必须通过_store_sparse_tensors
恢复相同的张量 。如果不是,你会泄漏内存。
我正在考虑用_restore_sparse_tensors
存储格式来替换这个包装器,但是现在我建议不要自己使用这些函数。相反,您可以使用新的DT_VARIANT
(即将成为tf.contrib.data
)库来执行您想要的操作!