为什么在某些时候而不是在某些时候不抓取抓取数据?

时间:2018-08-10 04:48:58

标签: python-2.7 xpath scrapy

我对Scrapy的了解有限。我确实尝试写蜘蛛。

class ExecSpider(scrapy.Spider):
    name = "indeed_uk"
    allowed_domains = ["indeed.co.uk"]
    start_urls = [
        'https://www.indeed.co.uk/jobs?q=%C2%A335,000&jt=contract',
    ]

示例链接如下:

https://www.indeed.com/viewjob?jk=5de17a782172c5b9&from=serp&vjs=3

下面的代码有时是行得通的,有时却行不通的:

salarystring = ''.join(map(lambda s:s.strip(),response.xpath('//div/span[@class = "no-wrap"]/text()').extract()))
            if salarystring is None:
                salarystring = ''.join(map(lambda s:s.strip(),response.xpath('//div[@class = "jobsearch-JobMetadataHeader-item "]/text()').extract()))
            print(salarystring)

我在这里写了两个xpath,因为在网站上有时是'//div/span[@class = "no-wrap"]/text()',现在是'//div[@class = "jobsearch-JobMetadataHeader-item "]/text()'

日志文件如下:

2018-08-10 14:09:32 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.indeed.com/viewjob?jk=20635031d9ceaad1&from=serp&vjs=3> (referer: https://www.indeed.com/jobs?q=$30,000&l=United+States&jt=contract&explvl=mid_level)
Traceback (most recent call last):
  File "/Users/imac086/anaconda3/lib/python3.6/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/Users/imac086/anaconda3/lib/python3.6/site-packages/scrapy/spidermiddlewares/offsite.py", line 30, in process_spider_output
    for x in result:
  File "/Users/imac086/anaconda3/lib/python3.6/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/Users/imac086/anaconda3/lib/python3.6/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/Users/imac086/anaconda3/lib/python3.6/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/Users/imac086/Documents/premal/test_interim_scrapper/interim/spiders/indeed.py", line 180, in parsejob
    print ('job rate ', item['job_rate'])
  File "/Users/imac086/anaconda3/lib/python3.6/site-packages/scrapy/item.py", line 59, in __getitem__
    return self._values[key]
KeyError: 'job_rate'

任何人都可以请我告诉我为什么这在Scrapy中为什么会发生这种情况,但是当我运行2-3次相同的代码时,有时却能正常工作吗?

预先感谢

0 个答案:

没有答案