我有新问题。最后一个(和我的第一个)得到了很好的回答。现在我有了AttributeError。它发生在爬行过程中。输出低于。我想知道它是如何发生的,因为我的代码是直接来自Scrapy官方教程的书。怎么了?再次感谢你!
2015-08-14 11:36:39 [scrapy] DEBUG: Crawled (200) <GET http://www.adacta.si/images/en/CP-Suite-Brochure.pdf> (referer: http://www.adacta.si/storitve/podpora-procesu-planiranja)
2015-08-14 11:36:39 [scrapy] ERROR: Spider error processing <GET http://www.adacta.si/images/en/CP-Suite-Brochure.pdf> (referer: http://www.adacta.si/storitve/podpora-procesu-planiranja)
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
yield next(it)
File "C:\Python27\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 28, in process_spider_output
for x in result:
File "C:\Python27\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 22, in <genexpr>
return (_set_referer(r) for r in result or ())
File "C:\Python27\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "C:\Python27\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 54, in <genexpr>
return (r for r in result or () if _filter(r))
File "E:\analitika\SURS\tutorial\tutorial\spiders\job_spider.py", line 25, in
parse
response.selector.remove_namespaces()
AttributeError: 'Response' object has no attribute 'selector'
#!/usr/bin/python
# -*- coding: utf-8 -*-
# encoding=UTF-8
import scrapy, urlparse, os
from scrapy.spiders import Rule
from scrapy.linkextractors import LinkExtractor
from tutorial.items import JobItem
from scrapy.utils.response import get_base_url
from scrapy.http import Request
from urlparse import urlparse, urljoin
from datetime import datetime
class JobSpider(scrapy.Spider):
name = "jobs"
#allowed_domains = ["www.aclovse.si"]
start_urls = ["http://www.adacta.si"]
#Check list, that helps us to avoid duplication of results.
jobs_urls = []
def parse(self, response):
response.selector.remove_namespaces()
#We choose all urls, they are defined by "href".
#These are either webpages on our website either new websites.
urls = response.xpath('//@href').extract()
#... and so on