我正在尝试抓取几个网页,我提前知道了这些网址。使用这里找到的相当标准的代码,这就是我所拥有的:
from lxml import html
import requests
from bs4 import BeautifulSoup
from PyQt4.QtGui import *
from PyQt4.QtCore import *
from PyQt4.QtWebKit import *
class Render(QWebPage):
def __init__(self, url):
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished.connect(self._loadFinished)
self.mainFrame().load(QUrl(url))
self.app.exec_()
def _loadFinished(self, result):
self.frame = self.mainFrame()
self.app.quit()
使用各种URL执行for循环可以正常使用前两个,然后崩溃并使用以下内容:
QObject::connect: Cannot connect
(null)::configurationAdded(QNetworkConfiguration) to
QNetworkConfigurationManager::configurationAdded(QNetworkConfiguration)
QObject::connect: Cannot connect
(null)::configurationRemoved(QNetworkConfiguration) to
QNetworkConfigurationManager::configurationRemoved(QNetworkConfiguration)
QObject::connect: Cannot connect
(null)::configurationChanged(QNetworkConfiguration) to
QNetworkConfigurationManager::configurationChanged(QNetworkConfiguration)
QObject::connect: Cannot connect (null)::onlineStateChanged(bool) to
QNetworkConfigurationManager::onlineStateChanged(bool)
QObject::connect: Cannot connect (null)::configurationUpdateComplete() to
QNetworkConfigurationManager::updateCompleted()
在研究类似问题后,我(模糊)对该问题的理解是,从一次迭代到下一次迭代,QApplication可能无法正确关闭。我尝试在每次迭代中暂停脚本5-20秒,没有任何影响。找不到任何其他适用的建议。任何有关此事的帮助表示赞赏。