我尝试使用抓取工具的结果生成CSV文件。因为它是德语,我需要UTF-8编码(ä,ö等)。这是我到目前为止的结果:
spider.py
import scrapy
from scrapy.spiders import BaseSpider
from scrapy.selector import Selector
from Polizeimeldungen.items import PolizeimeldungenItem
class PoliceSpider(scrapy.Spider):
name = "pm"
allowed_domains = ["berlin.de"]
start_urls =
["https://www.berlin.de/polizei/polizeimeldungen/archiv/2014/?page_at_1_0=1"]
def parse(self, response):
for sel in response.css('.row-fluid'):
item = PolizeimeldungenItem()
item['title'] = sel.css('a ::text').extract_first().encode('utf-8')
item['link'] = sel.css('a ::text').extract_first().encode('utf-8') // this is wrong, but it is easy to fix
yield item
items.py
import scrapy
class PolizeimeldungenItem(scrapy.Item):
title = scrapy.Field()
link = scrapy.Field()
pipelines.py
import csv
class PolizeimeldungenPipeline(object):
def __init__(self):
self.myCsv = csv.writer(open('Item.csv', 'wb'))
self.myCsv.writerow(['title', 'link'])
def process_item(self, item, spider):
self.myCsv.writerow([item['title'], item['link']])
return item
Settings.py
BOT_NAME = 'Polizeimeldungen'
SPIDER_MODULES = ['Polizeimeldungen.spiders']
NEWSPIDER_MODULE = 'Polizeimeldungen.spiders'
ITEM_PIPELINES = {'Polizeimeldungen.pipelines.PolizeimeldungenPipeline': 100}
结果如下:
scrapy crawl pm
我收到此错误消息:
TypeError: a bytes-like object is required, not 'str'
感谢您的帮助!!
更新:Python 3.6.0 :: Anaconda 4.3.1
答案 0 :(得分:0)
我假设你使用的是Python 3(这个解决方案不能使用Python 2)。
你需要改变两件事:
使用所需的输出编码以文本模式打开输出文件。
在PolizeimeldungenPipeline
的构造函数中,写一下:
self.myCsv = csv.writer(open('Item.csv', 'w', encoding='utf-8'))
不要对单元格进行编码(如PoliceSpider.parse
中所示):
item['title'] = sel.css('a ::text').extract_first()
等