Question

我已经能够从其他网站上抓取HTML，但我没有成功使用这个Metacritic网站。即使给出了元素的确切xpath，我似乎也无法抓取任何元素。

我正在努力拉分数和得分的来源。以下是我对来源的看法：

from lxml import html
import requests

url = "http://www.metacritic.com/movie/rogue-one-a-star-wars-story/critic-
reviews"
page = requests.get(url)
tree = html.fromstring(page.content)

#List of all critic sources for a particular movie
xpathselector="//span[@class ='source']/a"
metaElement = tree.xpath(xpathselector)
print metaElement

对于其中一个得分元素：

from lxml import html
import requests

url = "http://www.metacritic.com/movie/rogue-one-a-star-wars-story/critic-
reviews"
page = requests.get(url)
tree = html.fromstring(page.content)

#List of all critic scores for a particular movie
xpathselector="//div[@class='metascore_w large movie negative indiv']"
metaElement = tree.xpath(xpathselector)
print metaElement

输出始终为空列表[]。我意识到这只会打印元素，我需要将/text()添加到xpath的末尾，但我甚至无法做到这一点。

有什么想法？感谢

使用lxml刮取HTML即使给出了精确的xpath

0 个答案: