我尝试将scrapy CSVFeedSpider用于csv链接 这是一个例子:
号码,"可能包含逗号","可能包含逗号","可能包含逗号",文本,文字,文字,文字,文字和&# 34;可能包含逗号"
如果一个值包含逗号,它被引号括起来,我怎么能实现它,因为它只接受一个分隔符?
http://doc.scrapy.org/en/latest/topics/spiders.html#csvfeedspider
答案 0 :(得分:0)
如果列被双引号括起来,则内部使用逗号可以正常工作。 如果它被单引号
包围,它会抱怨长度不匹配这是蜘蛛代码:
# -*- coding: utf-8 -*-
from scrapy.spider import Spider
from scrapy.selector import Selector
from stackoverflow23429315.items import DemoItem
from scrapy.contrib.spiders import CSVFeedSpider
from scrapy import log
class DmozSpider(CSVFeedSpider):
name = 'csvFeedTest'
start_urls = ['file:////home/vagrant/labs/stackoverflow23429315/test.csv']
delimiter = ','
headers = ['id', 'name', 'address1', 'address2', 'email']
def parse_row(self, response, row):
log.msg('Hi, this is a row!: %r' % row)
item = DemoItem()
item['id'] = row['id']
item['name'] = row['name']
item['address1'] = row['address1']
item['address2'] = row['address2']
item['email'] = row['email']
return item
项目类别:
from scrapy.item import Item, Field
class DemoItem(Item):
id = Field()
name = Field()
address1 = Field()
address2 = Field()
email = Field()
测试csv文件:
1,"John, Doe","1234 Main Street, APT A","2nd Floor",John.Doe@test.com
2,"John2, Doe","1234 Main Street, APT A","2nd Floor",John.Doe@test.com
3,'John3, Doe','1234 Main Street, APT A','2nd Floor',John.Doe@test.com
4,'John4, Doe','1234 Main Street, APT A','2nd Floor',John.Doe@test.com