使用lxml刮取HTML即使给出了精确的xpath

时间:2017-06-01 16:19:45

标签: python python-2.7 web-scraping lxml

我已经能够从其他网站上抓取HTML,但我没有成功使用这个Metacritic网站。即使给出了元素的确切xpath,我似乎也无法抓取任何元素。

我正在努力拉分数和得分的来源。以下是我对来源的看法:

from lxml import html
import requests

url = "http://www.metacritic.com/movie/rogue-one-a-star-wars-story/critic-
reviews"
page = requests.get(url)
tree = html.fromstring(page.content)

#List of all critic sources for a particular movie
xpathselector="//span[@class ='source']/a"
metaElement = tree.xpath(xpathselector)
print metaElement

对于其中一个得分元素:

from lxml import html
import requests

url = "http://www.metacritic.com/movie/rogue-one-a-star-wars-story/critic-
reviews"
page = requests.get(url)
tree = html.fromstring(page.content)

#List of all critic scores for a particular movie
xpathselector="//div[@class='metascore_w large movie negative indiv']"
metaElement = tree.xpath(xpathselector)
print metaElement

输出始终为空列表[]。我意识到这只会打印元素,我需要将/text()添加到xpath的末尾,但我甚至无法做到这一点。

有什么想法?感谢

0 个答案:

没有答案