我正在使用python从网站中检索各种指标(例如,喜欢,推特共享等)。虽然XPath检索文本很好,但我遇到了这些指标(跨度内的文本)的问题。
<span class="pluginCountTextDisconnected">78</span>
现在我需要得到“78”,但是当我把它送到XPath时,Python不会返回任何内容。
这是XPath,以防万一:
//*[@id="u_0_2"]/span[2]
Python代码:
from lxml import html
import urllib2
from unicsv import CsvUnicodeReader
req=urllib2.Request("http://www.nu.nl/binnenland/3866370/reddingsbrigade-redt-369-mensen-zomer-.html")
tree = html.fromstring(urllib2.urlopen(req).read())
fb_likes = tree.xpath('//*[@id="u_0_2"]/span[2]')
print fb_likes
答案 0 :(得分:0)
将/text()
添加到xpath:
//*[@id="u_0_2"]/span[2]/text()
答案 1 :(得分:0)
您的范围位于iframe
,因此您需要在iframe内部获取文字(顺便说一下,//span[@class='pluginCountTextDisconnected']/text()
是正确的方式,但您在iframe之外)。所以你需要阅读src
之类的:
a = html.fromstring(urllib2.urlopen("http://www.nu.nl/binnenland/3866370/reddingsbrigade-redt-369-mensen-zomer-.htm").read())
iframe = html.fromstring(urllib2.urlopen(a.iframe["src"]).read())
fb_likes = iframe .xpath("//span[@class='pluginCountTextDisconnected']/text()")
抱歉,没有测试代码,这只是一个普遍的想法。
更新
import urllib2, lxml.html
iframe_asfile = urllib2.urlopen('http://www.facebook.com/plugins/like.php?action=recommend&app_id=&channel=http%3A%2F%2Fstatic.ak.facebook.com%2Fconnect%2Fxd_arbiter%2FZEbdHPQfV3x.js%3Fversion%3D41%23cb%3Df112fd0c7b19666%26domain%3Dwww.nu.nl%26origin%3Dhttp%253A%252F%252Fwww.nu.nl%252Ff62d30922cee5%26relation%3Dparent.parent&href=http%3A%2F%2Fwww.nu.nl%2Fbinnenland%2F3866370%2Freddingsbrigade-redt-369-mensen-zomer-.html&layout=box_count&locale=nl_NL&sdk=joey&send=false&show_faces=true&width=75')
iframe_data = iframe_asfile.read()
iframe_asfile.close()
iframe_html = lxml.html.document_fromstring(iframe_data)
fb_likes = iframe_html.xpath(".//span[@class='pluginCountTextDisconnected']/text()")
print fb_likes[0]
打印78