我对Scrapy的了解有限。我确实尝试写蜘蛛。
class ExecSpider(scrapy.Spider):
name = "indeed_uk"
allowed_domains = ["indeed.co.uk"]
start_urls = [
'https://www.indeed.co.uk/jobs?q=%C2%A335,000&jt=contract',
]
示例链接如下:
https://www.indeed.com/viewjob?jk=5de17a782172c5b9&from=serp&vjs=3
下面的代码有时是行得通的,有时却行不通的:
salarystring = ''.join(map(lambda s:s.strip(),response.xpath('//div/span[@class = "no-wrap"]/text()').extract()))
if salarystring is None:
salarystring = ''.join(map(lambda s:s.strip(),response.xpath('//div[@class = "jobsearch-JobMetadataHeader-item "]/text()').extract()))
print(salarystring)
我在这里写了两个xpath,因为在网站上有时是'//div/span[@class = "no-wrap"]/text()'
,现在是'//div[@class = "jobsearch-JobMetadataHeader-item "]/text()'
日志文件如下:
2018-08-10 14:09:32 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.indeed.com/viewjob?jk=20635031d9ceaad1&from=serp&vjs=3> (referer: https://www.indeed.com/jobs?q=$30,000&l=United+States&jt=contract&explvl=mid_level)
Traceback (most recent call last):
File "/Users/imac086/anaconda3/lib/python3.6/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
yield next(it)
File "/Users/imac086/anaconda3/lib/python3.6/site-packages/scrapy/spidermiddlewares/offsite.py", line 30, in process_spider_output
for x in result:
File "/Users/imac086/anaconda3/lib/python3.6/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/Users/imac086/anaconda3/lib/python3.6/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "/Users/imac086/anaconda3/lib/python3.6/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "/Users/imac086/Documents/premal/test_interim_scrapper/interim/spiders/indeed.py", line 180, in parsejob
print ('job rate ', item['job_rate'])
File "/Users/imac086/anaconda3/lib/python3.6/site-packages/scrapy/item.py", line 59, in __getitem__
return self._values[key]
KeyError: 'job_rate'
任何人都可以请我告诉我为什么这在Scrapy中为什么会发生这种情况,但是当我运行2-3次相同的代码时,有时却能正常工作吗?
预先感谢