Question

我正在使用 CSVFeedSpider 来抓取本地csv文件（foods.csv）。

这是：

calories    name                         price

650         Belgian Waffles              $5.95

900         Strawberry Belgian Waffles   $7.95

900         Berry-Berry Belgian Waffles  $8.95

600         French Toast                 $4.50

950         Homestyle Breakfast          $6.95

以下是 foods.py 文件的代码：

from scrapy.spiders import CSVFeedSpider
from foods_csv.items import FoodsCsvItem

class FoodsSpider(CSVFeedSpider):
    name = 'foods'
    start_urls = ['file:///users/Mina/Desktop/foods.csv']
    delimiter = ';'
    quotechar = "'"
    headers = ['name', 'price', 'calories']

    def parse_row(self, response, row):
        self.logger.info('Hi, this is a row!: %r', row)
        item = FoodsCsvItem()
        item['name'] = row['name']
        item['price'] = row['price']
        item['calories'] = row['calories']
        return item

items.py ：

import scrapy

class FoodsCsvItem(scrapy.Item):
    name = scrapy.Field()
    price = scrapy.Field()
    calories = scrapy.Field()

但它给了我这个错误：

2017-11-18 13:04:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET file:///users/Mina/Desktop/foods.csv> (referer: None)
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 1 (length: 1, should be: 3)
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 2 (length: 1, should be: 3)
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 3 (length: 1, should be: 3)
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 4 (length: 1, should be: 3)
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 5 (length: 1, should be: 3)
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 6 (length: 1, should be: 3)

一开始我只是在抓'名字'和'价格'，但它给了我同样的错误，所以我试图根据这个解决方案添加'卡路里'Scrapy: Scraping CSV File - not getting any output但没有改变！

我只需要刮“姓名”和“价格”我该怎么做？

Answer 1

发布它时，CSV文件的确切格式可能会丢失。如果格式与此处发布完全相同，那么它实际上看起来像TSV（制表符分隔值）文件，您可以尝试将delimiter = ';'更改为delimiter = '\t'。

但是，既然您已将'指定为引号字符，我认为这是正确的吗？我会尝试在CSV文件上运行搜索/替换，并将'替换为"，看看是否有帮助。在使用单引号之前，我有一些奇怪的问题。

Answer 2

试试这个

   def parse_row(self, response, row):
       self.logger.info('Hi, this is a row!: %r', row)
       item = FoodsCsvItem()
       item['name'] = row['name']
       item['price'] = row['price']
       item['calories'] = row['calories']
       return item

使用CSVFeedSpider时出错

2 个答案: