Question

使用Python 3.4，lxml和请求来刮取谷歌趋势。

在此示例中，我正在尝试检索位于这些span标记之间的文本“Johnny Depp”。我是lxml模块和XPath语法的新手，但我不确定此时我做错了什么。

提前谢谢。

HTML：

<span class="hottrends-single-trend-title ellipsis-maker-inner">Johnny Depp</span>

代码：

from lxml import html
import requests

page = requests.get('https://trends.google.com/trends/hottrends')
tree = html.fromstring(page.content)

#This will create a list of trends:
trends = tree.xpath('//span[@class="hottrends-single-trend-title ellipsis-maker-inner"]/text()')

print('Trends: ', trends)

结果：

Answer 1

使用相应的RSS URL，您可以使用item的XML解析器，甚至可以使用标准库中的title，因为XML结构比HTML版本简单得多。给定RSS XML，您可以遍历>>> from lxml import etree as ET >>> import requests >>> page = requests.get('https://trends.google.com/trends/hottrends/atom/feed?pn=p1') >>> root = ET.fromstring(page.content) >>> for trend in root.xpath('//item'): ... print trend.find('title').text ... spinner Old Navy Flip Flop Sale You Get Me Johnny Depp NHL Draft GLOW Despicable Me 3 Blake Griffin Robert Del Naja DJ Khaled Grateful Bella Thorne Tubelight interstellar Camila Cabello Mexico vs Russia Frank Mason Bam Adebayo TJ Leaf the house Dwyane Wade元素并打印pyplot.show()，例如（虽然最高结果不再是Johnny Depp＆＃39;现在:)）： / p>

pyplot.show(some_image_as_argument)

XPath没有找到任何结果

1 个答案: