我已经能够从其他网站上抓取HTML,但我没有成功使用这个Metacritic网站。即使给出了元素的确切xpath,我似乎也无法抓取任何元素。
我正在努力拉分数和得分的来源。以下是我对来源的看法:
from lxml import html
import requests
url = "http://www.metacritic.com/movie/rogue-one-a-star-wars-story/critic-
reviews"
page = requests.get(url)
tree = html.fromstring(page.content)
#List of all critic sources for a particular movie
xpathselector="//span[@class ='source']/a"
metaElement = tree.xpath(xpathselector)
print metaElement
对于其中一个得分元素:
from lxml import html
import requests
url = "http://www.metacritic.com/movie/rogue-one-a-star-wars-story/critic-
reviews"
page = requests.get(url)
tree = html.fromstring(page.content)
#List of all critic scores for a particular movie
xpathselector="//div[@class='metascore_w large movie negative indiv']"
metaElement = tree.xpath(xpathselector)
print metaElement
输出始终为空列表[]
。我意识到这只会打印元素,我需要将/text()
添加到xpath的末尾,但我甚至无法做到这一点。
有什么想法?感谢