Scrapy:简单项目

时间:2018-09-22 05:31:16

标签: python scrapy

我想开始一个简单的项目。这是Visual Studio的python项目。 VS以管理模式运行。 不幸的是,从来没有调用parse(...),而是应该调用。.

a1=[3,9,1]
a2=[8,3,4]

a = a1 + a2
a.sort()

编辑:我的输出:

import scrapy
from scrapy.crawler import CrawlerProcess
import logging

class BlogSpider(scrapy.Spider):
    name = 'blogspider'
    start_urls = ['https://blog.scrapinghub.com']

    def parse(self, response):
        for title in response.css('.post-header>h2'):
            yield {'title': title.css('a ::text').extract_first()}

        for next_page in response.css('div.prev-post > a'):
            yield response.follow(next_page, self.parse)
        logging.error("this should be printed")

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(BlogSpider)
process.start()
print("ready")

请注意:https://www.lfd.uci.edu/~gohlke/pythonlibs/使用了Twisted。

2 个答案:

答案 0 :(得分:0)

当我修复缩进开始工作后,这看起来像是整个缩进问题

2018-09-22 11:35:47 [root] ERROR: this should be printed

我的代码段相同

import scrapy
from scrapy.crawler import CrawlerProcess
import logging

class BlogSpider(scrapy.Spider):
    name = 'blogspider'
    start_urls = ['https://blog.scrapinghub.com']

    def parse(self, response):
        logging.error("this should be printed")
        for title in response.css('.post-header>h2'):
            yield {'title': title.css('a ::text').extract_first()}
        for next_page in response.css('div.prev-post > a'):
            yield response.follow(next_page, self.parse)
        logging.error("this should be printed")

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(BlogSpider)
process.start()
print("ready")

附上Pastbin粘贴https://pastebin.com/pDu8kW27

答案 1 :(得分:0)

我安装了Anaconda,然后执行了conda install -c conda-forge scrapy(出现了一些错误)。

现在一切正常。

Installation guide