Skype网址的InvalidSchema(“找不到用于'%s'的连接适配器”“%网址)

时间:2018-11-01 08:08:56

标签: python python-2.7 scrapy invalidoperationexception

我能够使用此功能从网页上收集数据

def code(self, response):
    code_loader = ItemLoader(item=SomeTestItem(), response=response)
    urls = response.xpath('//a/@href').extract()
    for url in urls:
        page = response.urljoin(url)
        code_loader.add_value('urls', page)
        code_loader.add_value('codes', requests.get(page).status_code)
    return code_loader

当我尝试收集有关上述代码结果的更多数据时,出现此错误

2018-11-01 16:06:03 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.advich.com/zh/> (referer: None)
Traceback (most recent call last):
File "/Users/googleadwords/ENV/lib/python2.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Users/googleadwords/ENV/wes/wes/spiders/seo_spider.py", line 21, in parse
return itemloader.parse(response)
File "/Users/googleadwords/ENV/wes/wes/itemloader/itemloader.py", line 36, in parse
loaders.add_value('links', self.code(response))
File "/Users/googleadwords/ENV/wes/wes/itemloader/itemloader.py", line 100, in code
code_loader.add_value('codes', requests.get(page).status_code)
File "/Users/googleadwords/ENV/lib/python2.7/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/Users/googleadwords/ENV/lib/python2.7/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/Users/googleadwords/ENV/lib/python2.7/site-packages/requests/sessions.py", line 524, in request
resp = self.send(prep, **send_kwargs)
File "/Users/googleadwords/ENV/lib/python2.7/site-packages/requests/sessions.py", line 631, in send
adapter = self.get_adapter(url=request.url)
File "/Users/googleadwords/ENV/lib/python2.7/site-packages/requests/sessions.py", line 722, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
InvalidSchema: No connection adapters were found for 'skype:+8615050520029?chat'

我认为此错误是由于网址中存在‘skype:+8615050520029?chat’而引起的。经过测试之后,我想问一下如何解决这个问题。

 result = requests.get('skype:+8615050520029?chat')---->

 No connection adapters were found for 'skype:+8615050520029?chat'

strat_url ='https:\ www.advich.com' 请帮助我,谢谢

1 个答案:

答案 0 :(得分:1)

尝试在您的for周期或更早的urls = response.css('a:not([href*=skype]):not([href*=mailto])::attr(href)').extract()中排除此类网址。

因为您对此请求“ URL”会遇到问题。