写入csv文件scrapy

时间:2013-12-21 12:59:41

标签: python csv scrapy

我想在scrapy中写入csv文件

 for rss in rsslinks:
  item = AppleItem()
  item['reference_link'] = response.url
  base_url = get_base_url(response)
  item['rss_link'] = urljoin_rfc(base_url,rss)
  #item['rss_link'] = rss
  items.append(item)
  #items.append("\n")
 f = open(filename,'a+')    #filename is apple.com.csv
 for item in items:
    f.write("%s\n" % item)

我的输出是:

{'reference_link': 'http://www.apple.com/'
 'rss_link': 'http://www.apple.com/rss '
{'reference_link': 'http://www.apple.com/rss/'
 'rss_link':   'http://ax.itunes.apple.com/WebObjects/MZStore.woa/wpa/MRSS/newreleases/limit=10/rss.xml'}
{'reference_link': 'http://www.apple.com/rss/'
 'rss_link':  'http://ax.itunes.apple.com/WebObjects/MZStore.woa/wpa/MRSS/newreleases/limit=25/rss.xml'}

我想要的是这种格式:

reference_link               rss_link  
http://www.apple.com/     http://www.apple.com/rss/

6 个答案:

答案 0 :(得分:66)

只需抓取-o csv,例如:

scrapy crawl <spider name> -o file.csv -t csv

答案 1 :(得分:3)

这是使用Python3对我有用的东西:

scrapy runspider spidername.py -o file.csv -t csv

答案 2 :(得分:2)

你需要

  1. 写下标题行;然后
  2. 为每个对象写入条目行。
  3. 你可以这样做:

    fields = ["reference_link", "rss_link"] # define fields to use
    with open(filename,'a+') as f: # handle the source file
        f.write("{}\n".format('\t'.join(str(field) 
                                  for field in fields))) # write header 
        for item in items:
            f.write("{}\n".format('\t'.join(str(item[field]) 
                                  for field in fields))) # write items
    

    请注意,"{}\n".format(s)"%s\n" % s的结果相同。

答案 3 :(得分:2)

解决此问题的最佳方法是使用python in-build csv 包。

import csv

file_name = open('Output_file.csv', 'w') #Output_file.csv is name of output file

fieldnames = ['reference_link', 'rss_link'] #adding header to file
writer = csv.DictWriter(file_name, fieldnames=fieldnames)
writer.writeheader()
for rss in rsslinks:
    base_url = get_base_url(response)
    writer.writerow({'reference_link': response.url, 'rss_link': urljoin_rfc(base_url, rss)}) #writing data into file.

答案 4 :(得分:0)

尝试tablib

dataset = tablib.Dataset()
dataset.headers = ["reference_link", "rss_link"]

def add_item(item):    
   dataset.append([item.get(field) for fields in dataset.headers])

for item in items:
    add_item(item)

f.write(dataset.csv)

答案 5 :(得分:0)

custom_settings = {
        'FEED_URI' : 'Quotes.csv'
    }