尝试写入CSV,但某些字段在scrapy for python中被排除

时间:2015-07-21 06:41:05

标签: python xpath web-scraping scrapy

我正在尝试写入CSV但是当我检查输出时,我看到一些“评论”字段留空,即使在我看到输出时它正确打印它。我相信这是一个zip()限制因为我使用它来打印列而不是连续10行。我在蜘蛛中打印的Xpath输出再次输出正确。我想知道它是zip或我的语法的限制吗?或者另一种猜测可能是delimeter=','

Pipline.py

import csv
import itertools
from string import maketrans
class CSVPipeline(object):

   def __init__(self):
      self.csvwriter = csv.writer(open('Output.csv', 'wb'),delimiter=',')
      self.csvwriter.writerow(['names','date','location','starts','subjects','reviews'])

   def process_item(self, item, ampa):

      rows = zip(item['names'],item['date'],item['location'],item['stars'],item['subjects'],item['reviews'])


      for row in rows:
         self.csvwriter.writerow(row)

      return item

示例输出,一些评论被排除

names,date,location,starts,subjects,reviews
Aastha2015,20 July 2015,"
Bengaluru (Bangalore), India
",5,Amazing Time in Ooty,"
Hi All, i visited Ooty on July 10th, choose to stay in Elk Hills hotel, i read reviews of almost all good hotels and decided to try Elk Hills. I must say the property is huge, very well maintained. Rooms are clean spacious & views are great. Food in the Cafe Blue was awesome. They forgot to give us the...
"
pushp2015,11 July 2015,"
Gurgaon, India
",3,Nice Hotel ...under going maintainance,"
"
REDDY84,25 June 2015,"
Chennai, India
",4,Good old property,"
Its an old property with a very good view. We booked a suite at a very reasonable price but they charged for an extra bed 1500 + txs which i feel was not required because the bed was already their in the suite room.Other then that everything was good. Breakfast was nice . The room they had given was neat...
"
arun606,20 June 2015,"
Mumbai, India
",5,Amazing Hospitality,"
"

2 个答案:

答案 0 :(得分:1)

我不确定,但我认为你所谓的限制更多是zip工作方式。

查看izip_longest,它不会停在最短的列表中。

示例:

>>> zip('abc', '12345')
[('a', '1'), ('b', '2'), ('c', '3')]
>>> list(itertools.izip_longest('abc', '12345', fillvalue=0))
[('a', '1'), ('b', '2'), ('c', '3'), (0, '4'), (0, '5')]

答案 1 :(得分:0)

想出来,正如@Martin Evans所说,我检查了长度,发现有很多回车只会放一个空格。我不知道为什么,但确实如此。要修复它,只需添加此代码即可。

while "\n" in yourlist['key']: yourlist['key'].remove("\n")