Question

我正在使用python从网站中检索各种指标（例如，喜欢，推特共享等）。虽然XPath检索文本很好，但我遇到了这些指标（跨度内的文本）的问题。

<span class="pluginCountTextDisconnected">78</span>

现在我需要得到“78”，但是当我把它送到XPath时，Python不会返回任何内容。

这是XPath，以防万一：

//*[@id="u_0_2"]/span[2]

Python代码：

from lxml import html
import urllib2  
from unicsv import CsvUnicodeReader

req=urllib2.Request("http://www.nu.nl/binnenland/3866370/reddingsbrigade-redt-369-mensen-zomer-.html")
tree = html.fromstring(urllib2.urlopen(req).read())
fb_likes = tree.xpath('//*[@id="u_0_2"]/span[2]')
print fb_likes

Answer 1

将/text()添加到xpath：

//*[@id="u_0_2"]/span[2]/text()

Answer 2

您的范围位于iframe，因此您需要在iframe内部获取文字（顺便说一下，//span[@class='pluginCountTextDisconnected']/text()是正确的方式，但您在iframe之外）。所以你需要阅读src之类的：

a = html.fromstring(urllib2.urlopen("http://www.nu.nl/binnenland/3866370/reddingsbrigade-redt-369-mensen-zomer-.htm").read())
iframe = html.fromstring(urllib2.urlopen(a.iframe["src"]).read())
fb_likes = iframe .xpath("//span[@class='pluginCountTextDisconnected']/text()")

抱歉，没有测试代码，这只是一个普遍的想法。

更新

import urllib2, lxml.html

iframe_asfile = urllib2.urlopen('http://www.facebook.com/plugins/like.php?action=recommend&app_id=&channel=http%3A%2F%2Fstatic.ak.facebook.com%2Fconnect%2Fxd_arbiter%2FZEbdHPQfV3x.js%3Fversion%3D41%23cb%3Df112fd0c7b19666%26domain%3Dwww.nu.nl%26origin%3Dhttp%253A%252F%252Fwww.nu.nl%252Ff62d30922cee5%26relation%3Dparent.parent&href=http%3A%2F%2Fwww.nu.nl%2Fbinnenland%2F3866370%2Freddingsbrigade-redt-369-mensen-zomer-.html&layout=box_count&locale=nl_NL&sdk=joey&send=false&show_faces=true&width=75')
iframe_data = iframe_asfile.read()
iframe_asfile.close()

iframe_html = lxml.html.document_fromstring(iframe_data)

fb_likes = iframe_html.xpath(".//span[@class='pluginCountTextDisconnected']/text()")
print fb_likes[0]

打印78

使用XPath检索span类中的文本

2 个答案: