我正在为所有正在展示并返回其imdb评级的电影抓取http://www.regmovies.com/Theatres/Theatre-Folder/Regal-Meridian-16-1082。
从scrapy shell我设置值:
fetch('http://www.regmovies.com/Theatres/Theatre-Folder/Regal-Meridian-16-1082')
response.xpath('//*[@id="content"]/div/div/div[2]/div[1]/div[7]/div[2]/div[1]/div/div[1]/h3/text()').extract()
返回的值为空>>> []
这是构建我的蜘蛛的最后一块。
答案 0 :(得分:1)
此页面使用JavaScipe获取数据,您可以在Chrome Dev Tools的NetWork选项卡中找到数据URL:
您应该将Scrapy Post
数据用于此网址:
In [9]: from scrapy.http import Request
In [10]: r = Request(url='http://www.regmovies.com/services/MovieListings.asmx/TheatrePerformances',
...: method='POST',
...: body='{"tmsId":"AABFY","date":"Sun Mar 19 2017"}',
...: headers={'Content-Type':'application/json', 'User-Agent':'Mozilla/5.0'})
In [11]: fetch(r)
2017-03-19 14:10:36 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://www.regmovies.com/services/MovieListings.asmx/TheatrePerformances> (referer: None)
In [12]: import json
In [13]: json.loads(response.text)