Question

以下代码产生了所需的结果，但我的问题的根源于每个追加减慢的过程。在前100页左右，它每页运行不到一秒钟，但到500秒时它最多可以运行3秒，而在1000秒运行时每次运行大约需要5秒。是否有关于如何使这更有效或解释为什么这就是事情的建议？

import lxml
from lxml import html
import itertools
import datetime

l=[]

for pageno in itertools.count(start=1):

    time = datetime.datetime.now()

    url = 'http://example.com/'
    parse = lxml.html.parse(url)

    try:

        for x in parse.xpath('//center'):
            x.getparent().remove(x)

            x.clear()
            while x.getprevious() is not None:
                del x.getparent()[0]

        for n in parse.xpath('//tr[@class="rt"]'):
            l.append([n.find('td/a').text.encode('utf8').decode('utf8').strip()\
                           ,n.find('td/form/p/a').text.encode('utf8').decode('utf8').strip()\
                           ,n.find('td/form/p/a').attrib['title'].encode('utf8').decode('utf8').strip()]\
                          +[c.text.encode('utf8').decode('utf8').strip() for c in n if c.text.strip() is not ''])

            n.clear()
            while n.getprevious() is not None:
                del n.getparent()[0]

    except:

        print 'Page ' + str(pageno) + 'Does Not Exist'

    print '{0} Pages Complete: {1}'.format(pageno,datetime.datetime.now()-time)

我尝试了许多解决方案，例如禁用垃圾收集器，将一个列表作为行写入文件而不是附加到大型列表等。我期待从潜在的建议/答案中学到更多。

循环使用Python逐渐减少每次迭代

0 个答案: