我正在尝试使用Scrapy进行水平爬行。使用xpath,我获得了房地产站点中每个列表的链接以及下一页链接。
但是,当我运行蜘蛛时,我会不断收到此错误:
“ scrapy.spidermiddlewares.httperror.HttpErrorMiddleware”
感谢您的帮助
class BasicSpider(scrapy.Spider):
name = 'manual'
allowed_domains = ['web']
start_urls = ('https://www.nyhabitat.com/new-york-
apartment/roommate-share',)
def parse (self, response):
next_selector = response.xpath ('//*[@id="pagination-
container"]/ul/li[8]/a/@href')
for url in next_selector.extract_first():
yield Request (response.urljoin(url))
item_selector = response.xpath('//a[contains (@class,
"slider-item-link")]//@href')
for url in item_selector.extract():
yield Request(response.urljoin(url))
def parse_item (self, response):
l = ItemLoader(item=PropertiesItem(), response=response)
l.add_xpath('price','//
[@id="availability"]/div/div[1]/b/text()',
MapCompose(lambda i: i.replace(',', ''), float),
re='[,.0-9]+')
结果:
'scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',