循环web.client.getPage函数时内存泄漏

时间:2011-05-21 20:14:30

标签: python memory-leaks twisted

我有一个页面定期使用此脚本刷新:

from twisted.web.client import getPage
from twisted.internet import reactor, task

def getData():
    dgp = getPage('http://www.google.com/')
    dgp.addCallback(dataLoadOK)
    dgp.addErrback(dataLoadError)

def dataLoadOK(value):
    print value

def dataLoadError(error):
    print error

loop = task.LoopingCall(getData)
loop.start(10, now=True)
reactor.run()

使用这种方式购买我得到了内存泄漏。有没有人帮我找到它?

修改 我试着使用garbage collection python module,然后把它拿出来:

GARBAGE OBJECTS:
:: <HTTPClientFactory: http://www.google.com/>
        type: <type 'instance'>
referrers: 3
    is class: True
    module: <module 'twisted.web.client' from '/usr/lib/python2.7/site-packages/twisted/web/client.pyc'>

:: {'status': '200', 'cookies': {'PREF': 'ID=d894e510f2ebe263:FF=0:TM=1306053252:LM=1306053252:S=ebpb4ZebRUu_EhiI', 'NID': '47=LxM9fbBBN-bVIeuLPOfvO-fgXOKw1n2suyZ2...
        type: <type 'dict'>
referrers: 3
    is class: True
    module: None

:: InsensitiveDict({})
        type: <type 'instance'>
referrers: 3
    is class: True
    module: <module 'twisted.python.util' from '/usr/lib/python2.7/site-packages/twisted/python/util.pyc'>

:: {'preserve': 1, 'data': {}}
        type: <type 'dict'>
referrers: 3
    is class: True
    module: None

:: <Deferred at 0x29e2cf8 current result: None>
        type: <type 'instance'>
referrers: 3
    is class: True
    module: <module 'twisted.internet.defer' from '/usr/lib/python2.7/site-packages/twisted/internet/defer.pyc'>

:: {'_chainedTo': None, 'called': True, '_canceller': None, 'callbacks': [], 'result': None, '_runningCallbacks': False}
        type: <type 'dict'>
referrers: 3
    is class: True
    module: None

:: <<class 'twisted.internet.tcp.Client'> to ('www.google.com', 80) at 2445090>
        type: <class 'twisted.internet.tcp.Client'>
referrers: 3
    is class: True
    module: <module 'twisted.internet.tcp' from '/usr/lib/python2.7/site-packages/twisted/internet/tcp.pyc'>
    line num: 681
        line: class Client(BaseClient):
        line:     """A TCP client."""
        line: 
        line:     def __init__(self, host, port, bindAddress, connector, reactor=None):
        line:         # BaseClient.__init__ is invoked later
        line:         self.connector = connector
        line:         self.addr = (host, port)
        line: 
        line:         whenDone = self.resolveAddress
        line:         err = None
        line:         skt = None
        line: 
        line:         try:
        line:             skt = self.createInternetSocket()
        line:         except socket.error, se:
        line:             err = error.ConnectBindError(se[0], se[1])
        line:             whenDone = None
        line:         if whenDone and bindAddress is not None:
        line:             try:
        line:                 skt.bind(bindAddress)
        line:             except socket.error, se:
        line:                 err = error.ConnectBindError(se[0], se[1])
        line:                 whenDone = None
        line:         self._finishInit(whenDone, skt, err, reactor)
        line: 
        line:     def getHost(self):
        line:         """Returns an IPv4Address.
        line: 
        line:         This indicates the address from which I am connecting.
        line:         """
        line:         return address.IPv4Address('TCP', *(self.socket.getsockname() + ('INET',)))
        line: 
        line:     def getPeer(self):
        line:         """Returns an IPv4Address.
        line: 
        line:         This indicates the address that I am connected to.
        line:         """
        line:         return address.IPv4Address('TCP', *(self.realAddress + ('INET',)))
        line: 
        line:     def __repr__(self):
        line:         s = '<%s to %s at %x>' % (self.__class__, self.addr, unsignedID(self))
        line:         return s

:: {'_tempDataBuffer': [], 'disconnected': 1, 'dataBuffer': '', '_tempDataLen': 0, 'realAddress': ('74.125.225.81', 80), 'connector': <twisted.internet.tcp.Connect...
        type: <type 'dict'>
referrers: 3
    is class: True
    module: None

:: []
        type: <type 'list'>
referrers: 3
    is class: True
    module: None

:: {'x-xss-protection': ['1; mode=block'], 'set-cookie': ['PREF=ID=d894e510f2ebe263:FF=0:TM=1306053252:LM=1306053252:S=ebpb4ZebRUu_EhiI; expires=Tue, 21-May-2013 0...
        type: <type 'dict'>
referrers: 3
    is class: True
    module: None

:: ['-1']
        type: <type 'list'>
referrers: 3
    is class: True
    module: None

:: ['private, max-age=0']
        type: <type 'list'>
referrers: 3
    is class: True
    module: None

:: ['text/html; charset=ISO-8859-1']
        type: <type 'list'>
referrers: 3
    is class: True
    module: None

:: ['PREF=ID=d894e510f2ebe263:FF=0:TM=1306053252:LM=1306053252:S=ebpb4ZebRUu_EhiI; expires=Tue, 21-May-2013 08:34:12 GMT; path=/; domain=.google.com', 'NID=47=LxM9...
        type: <type 'list'>
referrers: 3
    is class: True
    module: None

:: ['gws']
        type: <type 'list'>
referrers: 3
    is class: True
    module: None

:: ['1; mode=block']
        type: <type 'list'>
referrers: 3
    is class: True
    module: None

:: []
        type: <type 'list'>
referrers: 3
    is class: True
    module: None

:: <twisted.internet.tcp.Connector instance at 0x29e2cb0>
        type: <type 'instance'>
referrers: 3
    is class: True
    module: <module 'twisted.internet.tcp' from '/usr/lib/python2.7/site-packages/twisted/internet/tcp.pyc'>

:: ['Sun, 22 May 2011 08:34:12 GMT']
        type: <type 'list'>
referrers: 3
    is class: True
    module: None

:: {'reactor': <twisted.internet.selectreactor.SelectReactor object at 0x288bd10>, 'state': 'disconnected', 'factoryStarted': 0, 'bindAddress': None, 'factory': <H...
        type: <type 'dict'>
referrers: 3
    is class: True
    module: None

所以我在扭曲的函数中看到了一些未闭合的引用,我该如何避免呢?

2 个答案:

答案 0 :(得分:3)

尝试related questions中推荐的一些策略。但是,您可能没有内存泄漏,只需memory fragmentation

看起来“Python内存泄漏检测器”有一个非常严重的错误。它启用DEBUG_LEAK 阻止收集所有周期。换句话说,它创造了大量的大量泄漏。如果您只是在示例中添加一些代码来报告gc.garbage的内容而不启用DEBUG_LEAK,那么它将保持为空(如果任何对象实际泄漏,则会填充gc.garbage,即使您不要启用任何gc调试标志。)

答案 1 :(得分:2)

您安排循环调用的方式可能有问题。您没有从Deferred返回getData,因此电话可能会堆积起来。

如果检索网页的时间超过10秒,则会在第二个getData完成之前调用第二个getData。如果你正在使用一个试图限制你的网站(并且google.com确实如此),那么堆积的请求越多,它就会越推迟你。每次尝试都会占用一些内存,这看起来像是泄密。

如果这是问题(尽管你应该使用Jean-Paul建议的技术来发现实际上问题),那么你可以通过添加“return dgp”来解决它getData功能结束。