Scrapy生成csv文件(UTF-8)

时间:2017-05-01 10:49:16

标签: python csv utf-8 scrapy

我尝试使用抓取工具的结果生成CSV文件。因为它是德语,我需要UTF-8编码(ä,ö等)。这是我到目前为止的结果:

spider.py

import scrapy

from scrapy.spiders import BaseSpider
from scrapy.selector import Selector
from Polizeimeldungen.items import PolizeimeldungenItem


class PoliceSpider(scrapy.Spider):
  name = "pm"
  allowed_domains = ["berlin.de"]
  start_urls = 
["https://www.berlin.de/polizei/polizeimeldungen/archiv/2014/?page_at_1_0=1"]

  def parse(self, response):
    for sel in response.css('.row-fluid'):
        item = PolizeimeldungenItem()
        item['title'] = sel.css('a ::text').extract_first().encode('utf-8')
        item['link'] = sel.css('a ::text').extract_first().encode('utf-8') // this is wrong, but it is easy to fix  
        yield item

items.py

import scrapy

class PolizeimeldungenItem(scrapy.Item):
    title = scrapy.Field()
    link = scrapy.Field()

pipelines.py

import csv
class PolizeimeldungenPipeline(object):
def __init__(self):
    self.myCsv = csv.writer(open('Item.csv', 'wb'))
    self.myCsv.writerow(['title', 'link'])

    def process_item(self, item, spider):          
        self.myCsv.writerow([item['title'], item['link']])
        return item

Settings.py

BOT_NAME = 'Polizeimeldungen'

SPIDER_MODULES = ['Polizeimeldungen.spiders']
NEWSPIDER_MODULE = 'Polizeimeldungen.spiders'
ITEM_PIPELINES = {'Polizeimeldungen.pipelines.PolizeimeldungenPipeline': 100}

结果如下:

scrapy crawl pm

我收到此错误消息:

TypeError: a bytes-like object is required, not 'str'

感谢您的帮助!!

更新:Python 3.6.0 :: Anaconda 4.3.1

1 个答案:

答案 0 :(得分:0)

我假设你使用的是Python 3(这个解决方案不能使用Python 2)。

你需要改变两件事:

  • 使用所需的输出编码以文本模式打开输出文件。 在PolizeimeldungenPipeline的构造函数中,写一下:

    self.myCsv = csv.writer(open('Item.csv', 'w', encoding='utf-8'))
    
  • 不要对单元格进行编码(如PoliceSpider.parse中所示):

    item['title'] = sel.css('a ::text').extract_first()