如何用pyqt4刮几个网站,范围变化?

时间:2016-02-10 09:47:44

标签: python pyqt pyqt4 python-3.5

我想在java中搜索两个网站,以获取使用PyQt4.QtWebKit呈现页面然后获得所需链接的链接。代码适用于一个页面或网址,但在打印第一个网站的链接后停止(但继续运行直到强制退出)。似乎范围保留在render类的事件循环中。如何让程序改变范围并继续使用for循环并渲染第二个网站?在_loadFinished方法中使用exit()只是在第一次迭代后退出程序。也许python应用程序必须关闭并重新打开以呈现下一页,这是不可能的,因为应用程序是在程序之外打开/重新打开的?

import sys  
from PyQt4.QtGui import *  
from PyQt4.QtCore import *  
from PyQt4.QtWebKit import *
from PyQt4 import QtGui
from lxml import html 

class Render(QWebPage):
    def __init__(self, url):

        self.frame = None
        QWebPage.__init__(self)
        self.loadFinished.connect(self._loadFinished)
        self.mainFrame().load(QUrl(url))

    def _loadFinished(self, result):
        self.frame = self.mainFrame()
        result = self.frame.toHtml()
        formatted_result = str(result)
        tree = html.fromstring(formatted_result)
        archive_links = tree.xpath('//div/div/a/@href')[0:4]
        print(archive_links)


urls = ['http://pycoders.com/archive/', 'http://www.pythonjobshq.com']

def main(urls):

    app = QtGui.QApplication(sys.argv)
    for url in urls:
        r = Render(url)
    #s = Render(urls[1]) #The pages can be rendered parallel, but rendering more than a handful of pages a the same time is a bad idea
    sys.exit(app.exec_())

if __name__ == '__main__':
    main(urls)

感谢任何帮助!

0 个答案:

没有答案