我使用Scrapy从本页获取数据
产品列表动态显示。 找到获取产品的网址
但是当我用Scrapy刮掉它时,它会给我空页
<span class="pageSizeInformation" id="page0" data-page="0" data-pagesize="12">Page: 0 / Size: 12</span>
这是我的代码
# -*- coding: utf-8 -*-
import scrapy
from v4.items import Product
class GardenaCoopBricoLoisirsSpider(scrapy.Spider):
name = "Gardena_Coop_Brico_Loisirs_py"
start_urls = [
'https://www.bricoetloisirs.ch/coop/ajax/nextPage/(cpgnum=1&layout=7.01-14_180_69_164_182&uiarea=2&carea=%24ROOT&fwrd=frwd0&cpgsize=12)/.do?page=2&_=1473841539272'
]
def parse(self, response):
print response.body
答案 0 :(得分:1)
我解决了这个问题。
# -*- coding: utf-8 -*-
import scrapy
from v4.items import Product
class GardenaCoopBricoLoisirsSpider(scrapy.Spider):
name = "Gardena_Coop_Brico_Loisirs_py"
start_urls = [
'https://www.bricoetloisirs.ch/magasins/gardena'
]
def parse(self, response):
for page in xrange(1, 50):
url = response.url + '/.do?page=%s&_=1473841539272' % page
yield scrapy.Request(url, callback=self.parse_page)
def parse_page(self, response):
print response.body
答案 1 :(得分:0)
据我所知,网站使用JavaScript来进行Ajax调用
当你使用scrapy
时,页面的JS剂量无法加载。
您需要查看Selenium来抓取这些页面。
或者找出正在制作的ajax电话并自行发送。
检查此Can scrapy be used to scrape dynamic content from websites that are using AJAX?可能对您有所帮助
答案 2 :(得分:0)
我认为您需要像浏览器一样发送其他请求。尝试按如下方式修改代码:
# -*- coding: utf-8 -*-
import scrapy
from scrapy.http import Request
from v4.items import Product
class GardenaCoopBricoLoisirsSpider(scrapy.Spider):
name = "Gardena_Coop_Brico_Loisirs_py"
start_urls = [
'https://www.bricoetloisirs.ch/coop/ajax/nextPage/'
]
def parse(self, response):
request_body = '(cpgnum=1&layout=7.01-14_180_69_164_182&uiarea=2&carea=%24ROOT&fwrd=frwd0&cpgsize=12)/.do?page=2&_=1473841539272'
yield Request(url=response.url, body=request_body, callback=self.parse_page)
def parse_page(self, response):
print response.body