Question

    links_list = char.getLinks(words)
    for source_url in links_list:
        try:
            print 'Downloading URL: ' + source_url
            urldict = hash_url(source_url)
            source_url_short = urldict['url_short']
            source_url_hash = urldict['url_short_hash']
            if Url.objects.filter(source_url_short = source_url_short).count() == 0:
                    try:
                        htmlSource = getSource(source_url)
                    except:
                        htmlSource = '-'
                        print '\thtmlSource got an error...'
                new_u = Url(source_url = source_url, source_url_short = source_url_short, source_url_hash = source_url_hash, html = htmlSource)
                new_u.save()
                time.sleep(3)
            else:
                print '\tAlready in database'
        except:
            print '\tError with downloading URL..'
            time.sleep(3)
            pass


def getSource(theurl, unicode = 1, moved = 0):
    if moved == 1:
        theurl = urllib2.urlopen(theurl).geturl()
    urlReq = urllib2.Request(theurl)
    urlReq.add_header('User-Agent',random.choice(agents))
    urlResponse = urllib2.urlopen(urlReq)
    htmlSource = urlResponse.read()
    htmlSource =  htmlSource.decode('utf-8').encode('utf-8')
    return htmlSource

基本上这个代码的作用是......它需要一个URL列表并下载它们，将它们保存到数据库中。就是这样。

Answer 1

也许你的进程使用了太多的内存而服务器（也许是共享主机）只会因为耗尽你的内存配额而终止它。

在这里你使用一个可能占用大量内存的电话：

links_list = char.getLinks(words)
for source_url in links_list:
     ...

看起来您可能正在内存中构建整个列表，然后使用项目。相反，最好使用迭代器，其中一次检索一个对象。但这是一个猜测因为很难从你的代码中看出char.getLinks做了什么

如果你在调试模式下使用Django，那么内存使用量将会增加，正如Mark建议的那样。

Answer 2

如果你在Django中这样做，请确保DEBUG为False，否则它将缓存每个查询。

See FAQ

Answer 3

最简单的检查方法是转到任务管理器（在Windows上 - 或其他平台上的等效项）并检查Python进程的内存要求。如果它保持不变，则没有内存泄漏。如果没有，你在某处有内存泄漏，你需要调试

Answer 4

也许你应该得到beanstalkd这样的工作服务器，并考虑一次只做一个。

作业服务器将重新排队失败的那些，允许其余的完成。如果需要，您还可以同时运行多个客户端（即使在多台计算机上）。

设计更简单，更易于理解和测试，更具容错性，可重试性，更具可扩展性等......

我的代码是否泄漏了内存（python）？

4 个答案: