我正在尝试抓取此网页: https://www.google.com/maps/d/u/0/viewer?mid=10gfc4vm6VKjxIf6UhKLlMLePqTjTYXYC&ll=50.65039081184933%2C3.040291506005474&z=11 获取有关生产者的信息。 但是,当我通过刮板外壳发送请求时,我得到一个空响应:
$ scrapy shell "https://www.google.com/maps/d/u/0/viewer?mid=10gfc4vm6VKjxIf6UhKLlMLePqTjTYXYC&ll=50.6503908118493%2C3.040291506005474&z=11"
In [1]: response
这是我正在使用的代码
# -*- coding: utf-8 -*-
import datetime
import re
import scrapy
from aprobio.items import AprobioItem
class AprospiderSpider(scrapy.Spider):
name = 'aprospider'
allowed_domains = ['aprobio.fr']
start_urls = ['http://aprobio.fr/']
crawl_datetime = str(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
start_time = datetime.datetime.now()
def parse(self, response):
self.crawler.stats.set_value("start_time", self.start_time)
data = re.findall(r"var _pageData = = (.+?);\r", response.body.decode("utf-8"), re.S)
答案 0 :(得分:0)
已解决: 您在settings.py中将ROBOTSTXT_OBEY更改为False