我的Python代码粘贴在下面。
这是我在命令提示符中调用的文件,如下所示:
我正在使用Python v2.7.3和Scrapy v1.5.1。
C:\Users\Desktop\spider>scrapy crawl "quotes2"
此后,开始爬网,但发生了如下错误:
错误:Spider错误处理http://quotes.toscrape.com/>(参考:无)
对此进行任何修复:
# -*- coding: utf-8 -*-
import scrapy
class Quotes2Spider(scrapy.Spider):
name = "quotes2"
allowed_domains = ["quotes.toscrape.com"]
start_urls = (
'http://quotes.toscrape.com/',
)
def parse(self, response):
quotes = response.xpath('//*[@class="quote"]')
for quote in quotes:
text = quote.xpath('.//*[@class="text"]/text()').extract_first()
author = quote.xpath('.//*[@itemprop="author"]/text()').extract_first()
tags = quote.xpath('.//*[@itemprop="keywords"]/@content').extract_first()
print '\n'
print text
print author
print tags
print '\n'
next_page_url = response.xpath('//*[@class="next"]/a/@href').extract_first()
abosolute_next_page_url = response.urljoin(next_page_url)
yield scrapy.Request(abosolute_next_page_url)