我正在尝试使用Python PyQT4 + Beautiful Soup抓取几个网页。
由于我的整个程序的性质,我使用一个主脚本“program.py”调用其他脚本的函数,用漂亮的Soup进行不同的分析。
因此,主program.py 的简化架构如下:
program.py :
import script1
import script2
script1.function1(urlA)
script2.function2(urlB)
使用script1.py和script2.py,如下所示:
script1.py :
import requests
import re
from bs4 import BeautifulSoup
from PyQt4.QtGui import *
from PyQt4.QtCore import *
from PyQt4.QtWebKit import *
class Render(QWebPage):
def __init__(self, url):
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished.connect(self._loadFinished)
self.mainFrame().load(QUrl(url))
self.app.exec_()
def _loadFinished(self, result):
self.frame = self.mainFrame()
self.app.quit()
def function1(url):
r = Render(url)
soup = BeautifulSoup(unicode(r.frame.toHtml()))
#Do many things with soup.
#Nothing related to PyQT4 further in this script
我的脚本2具有完全相同的结构,但在另一个URL上执行其他操作。
script2.py :
import requests
import re
from bs4 import BeautifulSoup
from PyQt4.QtGui import *
from PyQt4.QtCore import *
from PyQt4.QtWebKit import *
class Render(QWebPage):
def __init__(self, url):
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished.connect(self._loadFinished)
self.mainFrame().load(QUrl(url))
self.app.exec_()
def _loadFinished(self, result):
self.frame = self.mainFrame()
self.app.quit()
def function2(url):
r = Render(url)
soup = BeautifulSoup(unicode(r.frame.toHtml()))
#Do many other things with soup
#Nothing related to PyQT4 further in this script
使用script1.py,一切正常。我的function1和分析运行成功。
但是script2.py错误,我有以下错误:
QObject::connect: Cannot connect (null)::configurationAdded(QNetworkConfiguration) to QNetworkConfigurationManager::configurationAdded(QNetworkConfiguration)
QObject::connect: Cannot connect (null)::configurationRemoved(QNetworkConfiguration) to QNetworkConfigurationManager::configurationRemoved(QNetworkConfiguration)
QObject::connect: Cannot connect (null)::configurationChanged(QNetworkConfiguration) to QNetworkConfigurationManager::configurationChanged(QNetworkConfiguration)
QObject::connect: Cannot connect (null)::onlineStateChanged(bool) to QNetworkConfigurationManager::onlineStateChanged(bool)
QObject::connect: Cannot connect (null)::configurationUpdateComplete() to QNetworkConfigurationManager::updateCompleted()
我花时间搜索这个问题,我发现PyQT4无法在同一个实例中加载多个页面。
问题是我需要PyQT4在将页面内容加载到Beautiful Soup之前呈现Javascripts。
所以我想我需要在script1的function1末尾添加一些“self.app.quit()”,这样script2中的function2也可以使用PyQT4呈现一个页面。但是我无法使它发挥作用。
答案 0 :(得分:0)
这个怎么样
r = Render(url)
soup = BeautifulSoup(unicode(r.frame.toHtml()))
r.app.quit()