scrapy文件(未知错误):从<none>中引用的<get http:=“” www.xxx.jpg =“”>下载图像时出错:'splash'

时间:2019-04-09 03:22:14

标签: python scrapy scrapy-splash

我想使用启动画面下载刮擦的图像。运行代码时,出现以下错误:

2019-04-09 11:09:32 [scrapy.pipelines.files] WARNING: File (unknown-error): Error downloading image from <GET https://www.xxxxx.jpg> referred in <None>: 'splash'

我尝试使用SplashRequest,但失败了。我该怎么办?请参阅下面的代码:

    def get_media_requests(self, item, info):
        try:
            for image_url in item['image']:
                yield SplashRequest(image_url,endpoint='render.html' )
        except:
            pass

1 个答案:

答案 0 :(得分:0)

检查documentation,SplashRequest需要两个自变量:self.parse_resultyield SplashRequest(url, self.parse_result, args={ # optional; parameters passed to Splash HTTP API 'wait': 0.5, # 'url' is prefilled from request url # 'http_method' is set to 'POST' for POST requests # 'body' is set to request body for POST requests }, endpoint='render.json', # optional; default is render.html splash_url='<url>', # optional; overrides SPLASH_URL slot_policy=scrapy_splash.SlotPolicy.PER_DOMAIN, # optional ) 。其余的是可选的:

self.parse_result

在您的代码中,您没有提供parse参数。您需要传递解析方法的名称。例如,如果您的解析方法称为yield SplashRequest(image_url, self.parse, endpoint='render.html' ) ,则使用:

{{1}}