当我尝试抓取我的网页时,它会给我输出,但出现一些错误:
ValueError: Missing scheme in request url: h
books2.py
class Books1Spider(Spider):
name = 'books1'
allowed_domains = ['books.toscrape.com']
start_urls = ['http://books.toscrape.com/']
headers = {
"Host": "localhost",
"Connection": "keep-alive",
"Cache-Control": "max-age=0",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"DNT": "1",
"Accept-Encoding": "gzip, deflate, sdch",
"Accept-Language":"en-US,en;q=0.8"
}
def parse_book(self,response):
title = response.xpath('//h1/text()').extract_first()
price = response.xpath('.//*[@class="price_color"]/text()').extract_first()
image_urls = response.xpath('.//img/@src').extract_first()
image_urls = image_urls.replace('../..','http://books.toscrape.com/')
rating = response.xpath('//*[contains(@class,"star-rating")]/@class').extract_first()
rating = rating.replace('star-rating','')
description = response.xpath('//*[@id="product_description"]/following-sibling::p/text()').extract_first()
yield { 'title':title,
'price':price,
'image_urls':image_urls,
'rating':rating,
'description': description,
}
预期结果:
{'rating': u' Five', 'price': u'\xa352.29', 'description': u'Scott Pilgrim\'s life is totally sweet. He\'s 23 years old, he\'s in a rockband, he\'s "between jobs" and he\'s dating a cute high school girl. Nothing could possibly go wrong, unless a seriously mind-blowing, dangerously fashionable, rollerblading delivery girl named Ramona Flowers starts cruising through his dreams and sailing by him at parties. Will Scott\'s awesome life get Scott Pilgrim\'s life is totally sweet. He\'s 23 years old, he\'s in a rockband, he\'s "between jobs" and he\'s dating a cute high school girl. Nothing could possibly go wrong, unless a seriously mind-blowing, dangerously fashionable, rollerblading delivery girl named Ramona Flowers starts cruising through his dreams and sailing by him at parties. Will Scott\'s awesome life get turned upside-down? Will he have to face Ramona\'s seven evil ex-boyfriends in battle? The short answer is yes. The long answer is Scott Pilgrim, Volume 1: Scott Pilgrim\'s Precious Little Life ...more', 'image_urls': u'http://books.toscrape.com//media/cache/97/27/97275841c81e66d53bf9313cba06f23e.jpg', 'title': u"Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)"}
实际结果是:
2019-02-07 16:06:54 [scrapy.core.scraper] ERROR: Error processing {'rating': u' Five', 'price': u'\xa352.29', 'description': u'Scott Pilgrim\'s life is totally sweet. He\'s 23 years old, he\'s in a rockband, he\'s "between jobs" and he\'s dating a cute high school girl. Nothing could possibly go wrong, unless a seriously mind-blowing, dangerously fashionable, rollerblading delivery girl named Ramona Flowers starts cruising through his dreams and sailing by him at parties. Will Scott\'s awesome life get Scott Pilgrim\'s life is totally sweet. He\'s 23 years old, he\'s in a rockband, he\'s "between jobs" and he\'s dating a cute high school girl. Nothing could possibly go wrong, unless a seriously mind-blowing, dangerously fashionable, rollerblading delivery girl named Ramona Flowers starts cruising through his dreams and sailing by him at parties. Will Scott\'s awesome life get turned upside-down? Will he have to face Ramona\'s seven evil ex-boyfriends in battle? The short answer is yes. The long answer is Scott Pilgrim, Volume 1: Scott Pilgrim\'s Precious Little Life ...more', 'image_urls': u'http://books.toscrape.com//media/cache/97/27/97275841c81e66d53bf9313cba06f23e.jpg', 'title': u"Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)"}
Traceback (most recent call last):
File "/home/divum/venv/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/divum/venv/local/lib/python2.7/site-packages/scrapy/pipelines/media.py", line 79, in process_item
requests = arg_to_iter(self.get_media_requests(item, info))
File "/home/divum/venv/local/lib/python2.7/site-packages/scrapy/pipelines/images.py", line 155, in get_media_requests
return [Request(x) for x in item.get(self.images_urls_field, [])]
File "/home/divum/venv/local/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 25, in __init__
self._set_url(url)
File "/home/divum/venv/local/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 62, in _set_url
raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: h
答案 0 :(得分:1)
您正在将image_urls
提取为u'…'
。 image_urls
的值必须是一个列表:[u'…']
。
在您的代码中,切换:
image_urls = response.xpath('.//img/@src').extract_first()
image_urls = image_urls.replace('../..','http://books.toscrape.com/')
到
image_url = response.xpath('.//img/@src').extract_first()
image_urls = [image_url.replace('../..','http://books.toscrape.com/')]
答案 1 :(得分:0)
看起来像您在请求呼叫中丢失或传递了一些无效的详细信息。
转到所有请求,然后确定您传递的网址格式正确。
尝试使用CREATE PARTITION FUNCTION PF_Date(date) AS RANGE LEFT FOR VALUES('2019-01-01','2020-01-01');
--first partition
SELECT *
FROM dbo.partitioned_table
WHERE key_column <= '2019-01-01';
--second partition
SELECT *
FROM dbo.partitioned_table
WHERE key_column > '2019-01-01' AND key_column <= '2020-01-01';
--last partition
SELECT *
FROM dbo.partitioned_table
WHERE key_column > '2020-01-01';
CREATE PARTITION FUNCTION PF_Date(date) AS RANGE RIGHT FOR VALUES('2019-01-01','2020-01-01');
--first partition
SELECT *
FROM dbo.partitioned_table
WHERE key_column < '2019-01-01';
--second partition
SELECT *
FROM dbo.partitioned_table
WHERE key_column >= '2019-01-01' AND key_column < '2020-01-01';
--last partition
SELECT *
FROM dbo.partitioned_table
WHERE key_column >= '2020-01-01';
,以便涵盖所有错误的网址架构。