PyQt4没有返回完整的网页内容

时间:2016-05-03 19:09:33

标签: python qt web-scraping pyqt4

我正试图从allrecipes.com上删除用户评论。因为allrecipes使用javascript,所以请求不起作用,但PyQt4应该。我想要的数据包含在class ='profile-review-card'的文章中。

当我查看返回的内容时,虽然它包含更多请求给了我,但它仍然缺少我想要的部分。为什么我仍然没有获得整页内容?

import sys
from PyQt4.QtGui import *
from PyQt4.QtCore import *
from PyQt4.QtWebKit import *
from bs4 import BeautifulSoup

class Render(QWebPage):
  def __init__(self, url):
    self.app = QApplication(sys.argv)
    QWebPage.__init__(self)
    self.loadFinished.connect(self._loadFinished)
    self.mainFrame().load(QUrl(url))
    self.app.exec_()

  def _loadFinished(self, result):
    self.frame = self.mainFrame()
    self.app.quit()

url = 'http://allrecipes.com/cook/2010/reviews/'
# use pyqt4 to render it
r = Render(url)
# pull the page content
result = r.frame.toHtml()
#use beautifulsoup to search through the content
x = BeautifulSoup(result, 'html.parser')
#search for recipes reviewed, a is empty list, it's not finding the data I want
a = x.find_all('article', class_="profile-review-card")

更新:我已经弄清楚为什么它没有返回它应该的内容。当浏览器加载页面时加载allrecipes用户评论页面时,在评论出现之前会有一个加载图标。我发现PyQt4卡在这个加载过程中。

loading loaded

当我启动PyQt4浏览器窗口并使用

加载网页时
url = 'http://allrecipes.com/cook/22/reviews/'
import sys
from PyQt4.QtWebKit import QWebView
from PyQt4.QtGui import QApplication
from PyQt4.QtCore import QUrl

app = QApplication(sys.argv)
browser = QWebView()
browser.load(QUrl(url))
browser.show()
app.exec_()

它被困在这个

pyqt window browser

显然PyQt4无法正确加载页面!我该如何解决这个问题?

不确定这是否相关但我注意到当我加载allrecipe配方页面时,它可以正常工作,但会返回Internet插件加载错误。是否有可能出现一些缺失的插件?

2016-05-06 19:06:39.520 Python[17348:178927] Error loading /Library/Internet Plug-Ins/DirectorShockwave.plugin/Contents/MacOS/DirectorShockwave:  dlopen(/Library/Internet Plug-Ins/DirectorShockwave.plugin/Contents/MacOS/DirectorShockwave, 262): no suitable image found.  Did find:
        /Library/Internet Plug-Ins/DirectorShockwave.plugin/Contents/MacOS/DirectorShockwave: mach-o, but wrong architecture
2016-05-06 19:06:39.547 Python[17348:178927] Error loading /Library/Internet Plug-Ins/Silverlight.plugin/Contents/MacOS/agcore:  dlopen(/Library/Internet Plug-Ins/Silverlight.plugin/Contents/MacOS/agcore, 262): no suitable image found.  Did find:
        /Library/Internet Plug-Ins/Silverlight.plugin/Contents/MacOS/agcore: mach-o, but wrong architecture
2016-05-06 19:06:39.548 Python[17348:178927] Error loading /Library/Internet Plug-Ins/OVSHelper.plugin/Contents/MacOS/OVSHelper:  dlopen(/Library/Internet Plug-Ins/OVSHelper.plugin/Contents/MacOS/OVSHelper, 262): no suitable image found.  Did find:
        /Library/Internet Plug-Ins/OVSHelper.plugin/Contents/MacOS/OVSHelper: mach-o, but wrong architecture
Vector smash protection is enabled.
objc[17348]: Class MacCocoaSocketServerHelperRtc is implemented in both /Library/Internet Plug-Ins/googletalkbrowserplugin.plugin/Contents/MacOS/googletalkbrowserplugin and /Library/Internet Plug-Ins/o1dbrowserplugin.plugin/Contents/MacOS/o1dbrowserplugin. One of the two will be used. Which one is undefined.
2016-05-06 19:06:39.646 Python[17348:178927] Error loading /Library/Internet Plug-Ins/DivX Web Player.plugin/Contents/MacOS/DivX Web Player:  dlopen(/Library/Internet Plug-Ins/DivX Web Player.plugin/Contents/MacOS/DivX Web Player, 262): no suitable image found.  Did find:
        /Library/Internet Plug-Ins/DivX Web Player.plugin/Contents/MacOS/DivX Web Player: mach-o, but wrong architecture
objc[17348]: Class AdobePDFProgressView is implemented in both /Library/Internet Plug-Ins/AdobePDFViewer.plugin/Contents/MacOS/AdobePDFViewer and /Library/Internet Plug-Ins/AdobePDFViewerNPAPI.plugin/Contents/MacOS/AdobePDFViewerNPAPI. One of the two will be used. Which one is undefined.
objc[17348]: Class ObjCTimerObject is implemented in both /Library/Internet Plug-Ins/AdobePDFViewer.plugin/Contents/MacOS/AdobePDFViewer and /Library/Internet Plug-Ins/AdobePDFViewerNPAPI.plugin/Contents/MacOS/AdobePDFViewerNPAPI. One of the two will be used. Which one is undefined.
2016-05-06 19:06:39.653 Python[17348:178927] Cannot find executable for CFBundle 0x7f8bf47249f0 </Library/Internet Plug-Ins/Unused> (not loaded)
2016-05-06 19:06:39.654 Python[17348:178927] Cannot find executable for CFBundle 0x7f8bf1dd8650 </Library/Internet Plug-Ins/Disabled Plug-Ins> (not loaded)
QFont::setPixelSize: Pixel size <= 0 (0)
QFont::setPixelSize: Pixel size <= 0 (0)

0 个答案:

没有答案