Scrapy不刮只爬行

时间:2017-04-30 08:52:11

标签: python html web-scraping scrapy dynamic-websites

我在使用python和scrapy的网站上练习,但它会出现此错误

 DEBUG: Crawled (200) <GET http://careers.kfc.com.au/apply/?postcode=2000> (referer: None)

我无法理解为什么,它应该有用。代码很短,我没有看到任何可能的问题 以下是代码

try:
    import scrapy
except ImportError:
    print "\nERROR IMPORTING THE NESSASARY LIBRARIES\n"

#File with all the links
#hellokitty = open('links.txt', 'r')
#making a list with all the links
#yourResult = [line.rstrip() for line in hellokitty.readlines()]

class SpiderMan(scrapy.Spider):
    name = 'man spider'

    #making start_urls equal to that list
    start_urls = ['http://careers.kfc.com.au/apply/?postcode=2000']

    def parse(self, response):
        SET_SELECTOR = 'div.jobs-in-your-area.fixed-search.fixed ul.accordion li.accordion-item'
        for attr in response.css(SET_SELECTOR):
            suberbname = 'a.accordion-title.location-title ::text'
            #ANOTHER FOR LOOP GOES HERE FOR THE INNER WORKINGS
            for nextattr in attr.css('ul.accordion li.accordion-item'):
                jobdestitle = 'a.accordion-title.job-title ::text'
                jobdes = 'div[class=job-description] div[id=description] p ::text'
                joblink = 'div[class=job-description] div[class=apply-now] a[class=button] ::attr(href)'

                yield { 
                        'SUBERB_NAME': attr.css(suberbname).extract_first(),
                        'JOBTITLE': nextattr.css(jobdestitle).extract_first(),
                        'JOB_DESCRIP': nextattr.css(jobdes).extract(),
                        'JOB_DESCRIP_LINK': nextattr.css(joblink).extract_first(),
                        }

,这是日志文件

2017-04-30 14:15:02 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: scrapybot)
2017-04-30 14:15:02 [scrapy.utils.log] INFO: Overridden settings: {'SPIDER_LOADER_WARN_ONLY': True, 'LOG_FILE': 'kukur.txt'}
2017-04-30 14:15:02 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2017-04-30 14:15:02 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-04-30 14:15:02 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-04-30 14:15:02 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-04-30 14:15:02 [scrapy.core.engine] INFO: Spider opened
2017-04-30 14:15:02 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-04-30 14:15:02 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-04-30 14:15:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://careers.kfc.com.au/apply/?postcode=2000> (referer: None)
2017-04-30 14:15:04 [scrapy.core.engine] INFO: Closing spider (finished)
2017-04-30 14:15:04 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 236,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 6478,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2017, 4, 30, 8, 45, 4, 704154),
 'log_count/DEBUG': 2,
 'log_count/INFO': 7,
 'response_received_count': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2017, 4, 30, 8, 45, 2, 192149)}
2017-04-30 14:15:04 [scrapy.core.engine] INFO: Spider closed (finished)

1 个答案:

答案 0 :(得分:3)

SET_SELECTORjobdesjoblink声明存在问题。

这是初始化它的正确方法:

SET_SELECTOR = 'div.jobs-in-your-area'
jobdes = 'div.job-description div#description p ::text'
joblink = 'div.job-description div.apply-now a.button ::attr(href)'

这是spider scrapy shell中的>>> # SET_SELECTOR modified >>> SET_SELECTOR = 'div.jobs-in-your-area' >>> >>> for attr in response.css(SET_SELECTOR): ... suberbname = 'a.accordion-title.location-title ::text' ... ... for nextattr in attr.css('ul.accordion li.accordion-item'): ... jobdestitle = 'a.accordion-title.job-title ::text' ... # Jobdes and joblink modified ... jobdes = 'div.job-description div#description p ::text' ... joblink = 'div.job-description div.apply-now a.button ::attr(href)' ... ... print('SUBERB_NAME: ',attr.css(suberbname).extract_first()) ... print('JOBTITLE: ', nextattr.css(jobdestitle).extract_first()) ... print('JOB_DESCRIP: ', nextattr.css(jobdes).extract()) ... print('JOB_DESCRIP_LINK: ', nextattr.css(joblink).extract_first()) ... SUBERB_NAME: Artarmon JOBTITLE: Customer Service Team Member JOB_DESCRIP: ['Company Information', 'KFC', " is the world's most popular chicken restaurant chain,\xa0specializing in our famous Original Recipe® fried chicken. It all started with one cook who created a finger lickin' good recipe more than ", '75', ' years ago, a list of secret herbs and spices scratched out on the back of the door to his kitchen. That cook was\xa0', 'Colonel Harland Sanders', ", of course, and today we still follow his formula for success, with real cooks breading and freshly preparing our delicious chicken by hand. Our aim is to put a smile on people's faces around the world and give every customer a special experience on each occasion. Our vision is that our jobs will be the best in the world for those committed to serving great food and looking after customers better than anyone else.", 'The Role', 'Customer Service Team Members are responsible for ensuring the provision of fresh, quality products, friendly and efficient service and maintaining clean and well-presented facilities for our valued customers!', 'Requirements/ key selection criteria', 'Experience', 'No experience necessary as full Training will be provided to all employees. Retail Traineeships are also available for employees who meet the required criteria.', 'Benefits:', "Working with KFC will give you financial independence, you'll receive recognition for your efforts and gain skills to set you on your career path. KFC is a place where good things happen as soon as you walk through the door.", 'Company Information', 'KFC', " is the world's most popular chicken restaurant chain,\xa0specializing in our famous Original Recipe® fried chicken. It all started with one cook who created a finger lickin' good recipe more than ", '75', ' years ago, a list of secret herbs and spices scratched out on the back of the door to his kitchen. That cook was\xa0', 'Colonel Harland Sanders', ", of course, and today we still follow his formula for success, with real cooks breading and freshly preparing our delicious chicken by hand. Our aim is to put a smile on people's faces around the world and give every customer a special experience on each occasion. Our vision is that our jobs will be the best in the world for those committed to serving great food and looking after customers better than anyone else.", 'The Role', 'Food Service Team Members consistently prepare high quality food products that create irresistible tastes for our customers whilst maintaining clean and well-presented facilities.', 'Requirements/ key selection criteria', 'Experience', 'No experience necessary as full training will be provided to all employees. Retail Traineeships are also available for employees who meet the required criteria.', 'Benefits:', "Working with KFC will give you financial independence, you'll receive recognition for your efforts and gain skills to set you on your career path. KFC is a place where good things happen as soon as you walk through the door."] JOB_DESCRIP_LINK: http://applynow.net.au/jobs/KFC553-customer-service-team-member :以及示例输出

scraping
  

注意:debugging时使用Developer toolsScrapy Shell   var canvas = document.createElement('canvas'); document.body.append(canvas); var ctx = canvas.getContext('2d'); canvas.width = 450; canvas.height = 100; var cursor = 0; function drawChar(ch) { ctx.font = "20px System"; var twidth = ctx.measureText(ch).width; ctx.fillText(ch, cursor * twidth, 20); } var letters = "ABCDEFGHIJKL"; letters = letters.split(""); for(cursor = 0; cursor < letters.length; cursor++) { drawChar(letters[cursor]); }快了很多。