我使用请求是为了获取和解析使用Scrapy 和 Scrapyrt(实时抓取)抓取的一些数据。
这是我的方法:
#pass spider to requests parameters #
params = {
'spider_name': spider,
'start_requests':True
}
# scrape items
response = requests.get('http://scrapyrt:9080/crawl.json', params)
print ('RESPONSE JSON',response.json())
data = response.json()
根据Scrapy documentation,将'start_requests'
参数设置为True
,蜘蛛程序会自动请求url,并将响应传递给parse方法,该方法是用于解析请求的默认方法。
start_requests
类型:布尔值
可选
蜘蛛是否应执行Scrapy.Spider.start_requests方法。当您在没有ScrapyRT的情况下正常运行Scrapy Spider时,默认情况下将执行start_requests,但默认情况下,此方法不在API中执行。默认情况下,我们假设Spider应该仅对参数中提供的url进行爬网,而无需对Spider类中定义的start_urls进行任何请求。 start_requests参数将覆盖此行为。 如果存在此参数,API将执行start_requests Spider方法。
但是安装程序不起作用。日志:
[2019-05-19 06:11:14,835: DEBUG/ForkPoolWorker-4] Starting new HTTP connection (1): scrapyrt:9080
[2019-05-19 06:11:15,414: DEBUG/ForkPoolWorker-4] http://scrapyrt:9080 "GET /crawl.json?spider_name=precious_tracks&start_requests=True HTTP/1.1" 500 7784
[2019-05-19 06:11:15,472: ERROR/ForkPoolWorker-4] Task project.api.routes.background.scrape_allmusic[87dbd825-dc1c-4789-8ee0-4151e5821798] raised unexpected: JSONDecodeError('Expecting value: line 1 column 1 (char 0)',)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/celery/app/trace.py", line 382, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/celery/app/trace.py", line 641, in __protected_call__
return self.run(*args, **kwargs)
File "/usr/src/app/project/api/routes/background.py", line 908, in scrape_allmusic
print ('RESPONSE JSON',response.json())
File "/usr/lib/python3.6/site-packages/requests/models.py", line 897, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
对于解决此错误的任何帮助,将不胜感激。
答案 0 :(得分:0)
该错误是由于Twisted 19.2.0
(一个scrapyrt依赖项)的错误所致,该错误假定响应的类型错误。
一旦我安装了Twisted==18.9.0
,它就可以工作。