Question

我是机器学习和python的新手！我希望我的代码能够预测在我的情况下主要是汽车的对象。当我启动脚本时，它运行平稳，但是在20张左右的图片后，由于内存泄漏，它使系统挂起。我希望此脚本能够运行到我的整个数据库中，而该数据库远不止20张图片。

我尝试使用pympler跟踪器来跟踪哪些对象占用的内存最多-

这是我试图运行以预测图片中的对象的代码：

from imageai.Prediction import ImagePrediction
import os
import urllib.request
import mysql.connector
from pympler.tracker import SummaryTracker
tracker = SummaryTracker()

mydb = mysql.connector.connect(
  host="localhost",
  user="phpmyadmin",
  passwd="anshu",
  database="python_test"
)
counter = 0
mycursor = mydb.cursor()

sql = "SELECT id, image_url FROM `used_cars` " \
      "WHERE is_processed = '0' AND image_url IS NOT NULL LIMIT 1"
mycursor.execute(sql)
result = mycursor.fetchall()



def dl_img(url, filepath, filename):
    fullpath = filepath + filename
    urllib.request.urlretrieve(url,fullpath)

for eachfile in result:
    id = eachfile[0]
    print(id)
    filename = "image.jpg"
    url = eachfile[1]
    filepath = "/home/priyanshu/PycharmProjects/untitled/images/"
    print(filename)
    print(url)
    print(filepath)
    dl_img(url, filepath, filename)

    execution_path = "/home/priyanshu/PycharmProjects/untitled/images/"

    prediction = ImagePrediction()
    prediction.setModelTypeAsResNet()
    prediction.setModelPath( os.path.join(execution_path,                 "/home/priyanshu/Downloads/resnet50_weights_tf_dim_ordering_tf_kernels.h    5"))
    prediction.loadModel()

    predictions, probabilities =         prediction.predictImage(os.path.join(execution_path, "image.jpg"), result_count=1)
    for eachPrediction, eachProbability in zip(predictions, probabilities):
        per = 0.00
        label = ""
        print(eachPrediction, " : ", eachProbability)
        label = eachPrediction
        per = eachProbability

    print("Label: " + label)
    print("Per:" + str(per))
    counter = counter + 1
    print("Picture Number: " + str(counter))

    sql1 = "UPDATE used_cars SET is_processed = '1' WHERE id = '%s'" % id
    sql2 = "INSERT into label (used_car_image_id, object_label, percentage) " \
           "VALUE ('%s', '%s', '%s') " % (id, label, per)
    print("done")

    mycursor.execute(sql1)
    mycursor.execute(sql2)

    mydb.commit()
    tracker.print_diff()

这是我从一张图片中得到的结果，并且经过一些迭代后它消耗了整个RAM。我该怎么做才能阻止泄漏？

seat_belt  :  12.617655098438263
Label: seat_belt
Per:12.617655098438263
Picture Number: 1
done
types |    objects |   total size
<class 'tuple |      130920 |     11.98 MB
<class 'dict |       24002 |      6.82 MB
<class 'list |       56597 |      5.75 MB
<class 'int |      175920 |      4.70 MB
<class 'str |       26047 |      1.92 MB
<class 'set |         740 |    464.38 KB
<class 'tensorflow.python.framework.ops.Tensor |        6515 |    
356.29 KB
<class 'tensorflow.python.framework.ops.Operation._InputList |        
6097 |    333.43 KB
<class 'tensorflow.python.framework.ops.Operation |        6097 |    
333.43 KB
<class 'SwigPyObject |        6098 |    285.84 KB
<class 'tensorflow.python.pywrap_tensorflow_internal.TF_Output |        
4656 |    254.62 KB
<class 'tensorflow.python.framework.traceable_stack.TraceableObject |        3309 |    180.96 KB
<class 'tensorflow.python.framework.tensor_shape.Dimension |        
     1767 |     96.63 KB
<class 'tensorflow.python.framework.tensor_shape.TensorShapeV1 |        
1298 |     70.98 KB
<class 'weakref |         807 |     63.05 KB

Answer 1

看看这篇文章：Tracing python memory leaks

此外，请注意，garbage collection module实际上可以设置调试标志。查看set_debug函数。此外，请查看this code by Gnibbler以确定呼叫后已创建的对象的类型。

Answer 2

在这种情况下，每次在for循环中使用图像加载模型。该模型应位于for循环之外，在这种情况下，该模型不会每次都启动并且不会占用程序占用的内存。代码应该以这种方式工作->

execution_path = "/home/priyanshu/PycharmProjects/untitled/images/"

prediction = ImagePrediction()
prediction.setModelTypeAsResNet()
prediction.setModelPath( os.path.join(execution_path, "/home/priyanshu/Downloads/resnet50_weights_tf_dim_ordering_tf_kernels.h    5"))
prediction.loadModel()

for eachfile in result:
    id = eachfile[0]
    print(id)
    filename = "image.jpg"
url = eachfile[1]
filepath = "/home/priyanshu/PycharmProjects/untitled/images/"
print(filename)
print(url)
print(filepath)
dl_img(url, filepath, filename)

predictions, probabilities = prediction.predictImage(os.path.join(execution_path, "image.jpg"), result_count=1)
for eachPrediction, eachProbability in zip(predictions, probabilities):
    per = 0.00
    label = ""
    print(eachPrediction, " : ", eachProbability)
    label = eachPrediction
    per = eachProbability

    print("Label: " + label)
    print("Per:" + str(per))
    counter = counter + 1
    print("Picture Number: " + str(counter))

    sql1 = "UPDATE used_cars SET is_processed = '1' WHERE id = '%s'" % id
    sql2 = "INSERT into label (used_car_image_id, object_label, percentage) " \
       "VALUE ('%s', '%s', '%s') " % (id, label, per)
    print("done")

    mycursor.execute(sql1)
    mycursor.execute(sql2)

    mydb.commit()
    tracker.print_diff()

如何在python代码中检测内存泄漏？

2 个答案: