我正在尝试使用python从网站中提取指标:http://www.bild.de/regional/hamburg/mord/das-denkt-der-presserat-ueber-den-mord-an-unserer-tochter-lisa-41186944.bild.html
我需要黄色“LACHEN”按钮下的文本(数字)(现在是149)。该特定元素的XPath是//*[@id="jsm_16584"]/ul/li[1]/span
但是当我尝试查询它时,它不会返回任何对象:
url = "http://www.bild.de/regional/hamburg/mord/das-denkt-der-presserat-ueber-den-mord-an-unserer-tochter-lisa-41186944.bild.html"
req=urllib2.Request(url)
tree = lxml.html.fromstring(urllib2.urlopen(req).read())
metric=tree.xpath('//*[@id="jsm_16584"]/ul/li[1]/span')
print metric
它将metric
作为空列表返回。
答案 0 :(得分:0)
urlopen 没有执行你刚刚获得原始html的任何脚本,因此如果数据是由javascript生成的,则不会使用此方法呈现它们。这样的事情应该有效:
import sys
from PyQt4.QtGui import *
from PyQt4.QtCore import *
from PyQt4.QtWebKit import *
from lxml import html
class Render(QWebPage):
def __init__(self, url):
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished.connect(self._loadFinished)
self.mainFrame().load(QUrl(url))
self.app.exec_()
def _loadFinished(self, result):
self.frame = self.mainFrame()
self.app.quit()
url = 'http://www.bild.de/regional/hamburg/mord/das-denkt-der-presserat-ueber-den-mord-an-unserer-tochter-lisa-41186944.bild.html'
r = Render(url)
page = r.frame.toHtml()
tree = html.fromstring(page)
metric=tree.xpath('//button[@class="btn-mood-1"]/@data-mood-count')
print(metric)