使用Python 3.4,lxml和请求来刮取谷歌趋势。
在此示例中,我正在尝试检索位于这些span标记之间的文本“Johnny Depp”。我是lxml模块和XPath语法的新手,但我不确定此时我做错了什么。
提前谢谢。
HTML:
<span class="hottrends-single-trend-title ellipsis-maker-inner">Johnny Depp</span>
代码:
from lxml import html
import requests
page = requests.get('https://trends.google.com/trends/hottrends')
tree = html.fromstring(page.content)
#This will create a list of trends:
trends = tree.xpath('//span[@class="hottrends-single-trend-title ellipsis-maker-inner"]/text()')
print('Trends: ', trends)
答案 0 :(得分:1)
使用相应的RSS URL,您可以使用item
的XML解析器,甚至可以使用标准库中的title
,因为XML结构比HTML版本简单得多。给定RSS XML,您可以遍历>>> from lxml import etree as ET
>>> import requests
>>> page = requests.get('https://trends.google.com/trends/hottrends/atom/feed?pn=p1')
>>> root = ET.fromstring(page.content)
>>> for trend in root.xpath('//item'):
... print trend.find('title').text
...
spinner
Old Navy Flip Flop Sale
You Get Me
Johnny Depp
NHL Draft
GLOW
Despicable Me 3
Blake Griffin
Robert Del Naja
DJ Khaled Grateful
Bella Thorne
Tubelight
interstellar
Camila Cabello
Mexico vs Russia
Frank Mason
Bam Adebayo
TJ Leaf
the house
Dwyane Wade
元素并打印pyplot.show()
,例如(虽然最高结果不再是Johnny Depp&#39;现在:)): / p>
pyplot.show(some_image_as_argument)