Django,队列和线程 - 不释放内存

时间:2017-06-27 08:30:16

标签: python django multithreading

用户访问http://example.com/url/并从page_parser调用views.pypage_parserclass Foo创建script.py的实例。 每次访问http://example.com/url/时,我都会看到内存使用率上升。我想垃圾收集器不会收集实例化的class Foo。任何想法为什么会这样?

以下是代码:

views.py:

from django.http import HttpResponse
from script import Foo
from script import urls

# When user visits http://example.com/url/ I run `page_parser`
def page_parser(request):
    Foo(urls)
    return HttpResponse("alldone")

script.py:

import requests

from queue import Queue
from threading import Thread


class Newthread(Thread):
    def __init__(self, queue, result):
        Thread.__init__(self)
        self.queue = queue
        self.result = result

    def run(self):
        while True:
            url = self.queue.get()
            data = requests.get(url) # Download image at url
            self.result.append(data)
            self.queue.task_done()


class Foo:
    def __init__(self, urls):
        self.result = list()
        self.queue = Queue()
        self.startthreads()
        for url in urls:
            self.queue.put(url)
        self.queue.join()

    def startthreads(self):
        for x in range(3):
            worker = Newthread(queue=self.queue, result=self.result)
            worker.daemon = True
            worker.start()


urls = [
    "https://static.pexels.com/photos/106399/pexels-photo-106399.jpeg",
    "https://static.pexels.com/photos/164516/pexels-photo-164516.jpeg",
    "https://static.pexels.com/photos/206172/pexels-photo-206172.jpeg",
    "https://static.pexels.com/photos/32870/pexels-photo.jpg",
    "https://static.pexels.com/photos/106399/pexels-photo-106399.jpeg",
    "https://static.pexels.com/photos/164516/pexels-photo-164516.jpeg",
    "https://static.pexels.com/photos/206172/pexels-photo-206172.jpeg",
    "https://static.pexels.com/photos/32870/pexels-photo.jpg",
    "https://static.pexels.com/photos/32870/pexels-photo.jpg",
    "https://static.pexels.com/photos/106399/pexels-photo-106399.jpeg",
    "https://static.pexels.com/photos/164516/pexels-photo-164516.jpeg",
    "https://static.pexels.com/photos/206172/pexels-photo-206172.jpeg",
    "https://static.pexels.com/photos/32870/pexels-photo.jpg"]

1 个答案:

答案 0 :(得分:0)

涉及到几个活动部分,但我认为发生的事情如下:

  1. 每次请求后都不会杀死WSGI进程,因此可能会持续存在。
  2. 您创建了3个新线程,但不要让它们再次加入主线程,例如当队列为空时。
  3. 由于引用计数Foo.queue永远不会达到零(因为线程仍处于活动状态,等待新的队列项),因此无法进行垃圾回收
  4. 所以你不断创建新的线程,新的Foo类,并且不能释放它们。

    我不是queue.Queue的专家,但是如果你能看到WSGI进程中的线程数量每个请求3个(例如使用top(1)),我的理论可以被验证。 / p>

    作为旁注,这是您班级设计的副作用。你在__init__中做了所有事情,这应该只是分配类属性。