无法摆脱csv输出

时间:2017-08-27 19:15:25

标签: python-3.x csv web-scraping scrapy scrapy-spider

我在python scrapy中编写了一个非常小的脚本来解析黄页网站上多个页面显示的姓名,街道和电话号码。当我运行我的脚本时,我发现它运行顺利。但是,我遇到的唯一问题是数据在csv输出中被刮掉的方式。它总是两行之间的行(行)间隙。我的意思是:数据每隔一行打印一次。看到下面的图片,你就会明白我的意思。如果不是scrapy,我可以使用[newline ='']。但是,不幸的是我在这里完全无助。如何摆脱csv输出中出现的空白行?提前感谢您仔细研究。

items.py包括:

import scrapy

class YellowpageItem(scrapy.Item):
    name = scrapy.Field()
    street = scrapy.Field()
    phone = scrapy.Field()

这是蜘蛛:

import scrapy

class YellowpageSpider(scrapy.Spider):
    name = "YellowpageSp"
    start_urls = ["https://www.yellowpages.com/search?search_terms=Pizza&geo_location_terms=Los%20Angeles%2C%20CA&page={0}".format(page) for page in range(2,6)]

    def parse(self, response):
        for titles in response.css('div.info'):
            name = titles.css('a.business-name span[itemprop=name]::text').extract_first()
            street = titles.css('span.street-address::text').extract_first()
            phone = titles.css('div[itemprop=telephone]::text').extract_first()
            yield {'name': name, 'street': street, 'phone':phone}

以下是csv输出的结果:

enter image description here

顺便说一句,我用来获取csv输出的命令是:

scrapy crawl YellowpageSp -o items.csv -t csv

1 个答案:

答案 0 :(得分:7)

您可以通过创建新的FeedExporter来修复它。将您的FEED_EXPORTERS = { 'csv': 'project.exporters.FixLineCsvItemExporter', } 更改为

exporters.py

在项目中创建import io import os import six import csv from scrapy.contrib.exporter import CsvItemExporter from scrapy.extensions.feedexport import IFeedStorage from w3lib.url import file_uri_to_path from zope.interface import implementer @implementer(IFeedStorage) class FixedFileFeedStorage(object): def __init__(self, uri): self.path = file_uri_to_path(uri) def open(self, spider): dirname = os.path.dirname(self.path) if dirname and not os.path.exists(dirname): os.makedirs(dirname) return open(self.path, 'ab') def store(self, file): file.close() class FixLineCsvItemExporter(CsvItemExporter): def __init__(self, file, include_headers_line=True, join_multivalued=',', **kwargs): super(FixLineCsvItemExporter, self).__init__(file, include_headers_line, join_multivalued, **kwargs) self._configure(kwargs, dont_fail=True) self.stream.close() storage = FixedFileFeedStorage(file.name) file = storage.open(file.name) self.stream = io.TextIOWrapper( file, line_buffering=False, write_through=True, encoding=self.encoding, newline="", ) if six.PY3 else file self.csv_writer = csv.writer(self.stream, **kwargs)

<强> exporters.py

newline="\n"

我在Mac上,因此无法测试其Windows行为。但如果上述方法不起作用,请更改部分代码并设置 self.stream = io.TextIOWrapper( file, line_buffering=False, write_through=True, encoding=self.encoding, newline="\n", ) if six.PY3 else file

{{1}}