Question

我想在每个csv字段中编写多个列表。其中一个列表包含多个项目。我想在单个csv字段中写入该列表中的项目。但我无法做到这一点。我的代码是：

def __init__(self):
    self.myCSV = csv.writer(open('office-ves_04112014.csv', 'wb'),dialect="excel",quotechar='"', quoting=csv.QUOTE_ALL)
    self.myCSV.writerow(['location','h1','count','urllist'])

def process_item(self, item, spider):
 self.myCSV.writerow([item['location'][0].encode('utf-8'),item['h1'][0].encode('utf-8'),item['count'], item['url']])
 return item

我使用代码在scrapy中生成csv文件。 urllist 是包含多个项目的必填列表。当前代码将整个列表返回到单个字段中：

[u'urllistitem1', u'urllistitem2', u'urllistitem3']

这不是我想要的。预期的输出是：

urllistitem1,urllistitem2,urllist3

我的蜘蛛代码是：

class MyItem(Item):
 url = Field()
 location = Field()
 h1 = Field()
 count = Field()


class MySpider(BaseSpider):
 name = "officevesdetail"
 allowed_domains = ["xyz.nl"]
 f = open("officelist-ves.txt")
 start_urls = [url.strip() for url in f.readlines()]
 f.close()

 def parse(self, response):
  item = MyItem()
  sel = Selector(response)
  item['url'] = sel.xpath('//div[@class="text"]/h3/a/@href').extract()
  item['h1'] = sel.xpath("//h1[@class='no-bd']/text()").extract()
  item['count'] = len(item['url'])
  item['location'] = sel.xpath('//input[@name="Location"]/@value').extract()
  yield item

如果我尝试

item['url'][0].encode('utf-8')

我只获得第一个网址i..e urllistitem1

Answer 1

你对csv编写器的最后一个参数向它发送一个项目列表而不是字符串。我猜这是因为你不知道这个清单会有多长。没问题：你已经发送了一个列表，所以只需将两个列表一起添加，最好是在编码第二个列表的所有元素之后：

def process_item(self, item, spider):
    self.myCSV.writerow([item['location'][0].encode('utf-8'),
                         item['h1'][0].encode('utf-8'),
                         item['count']] + 
                        [i.encode('utf-8') for i in item['url']]])
    return item

CSV在单个csv字段中的单个列表中写入多个项目

1 个答案: