CSV追加会覆盖除列标题

时间:2015-10-07 14:23:16

标签: python python-2.7

我正在尝试获取在线评论(多个页面),提取每个评论的部分(标题,用户,文本......)并将该信息写入csv文件。是的,这些问题已被多次询问,但我找不到解决我的问题的问题:

首先我创造&在开头准备csv文件的列标题:

with open('review-raw-data.csv', 'wb') as output:
    fieldnames = ['title', 'text', 'starRating', 'helpfulScore', 'date', 'user', 'id', 'url']
    writer = csv.DictWriter(output, delimiter=',', fieldnames=fieldnames, quoting=csv.QUOTE_ALL, restval='unknown',                       extrasaction='ignore')

哪个工作正常。后来我试图将提取的信息附加到该csv文件:

def extract(data):
    with open('review-raw-data.csv', 'ab') as output:
        fieldnames = ['title', 'text', 'starRating', 'helpfulScore', 'date', 'user', 'id', 'url']
        writer = csv.DictWriter(output, delimiter=',', fieldnames=fieldnames, lineterminator='\n', quoting=csv.QUOTE_ALL, restval='unknown', extrasaction='ignore')

        for review in data:
            # extraction happening...
            reviewobj = Review(title, text, helpfulscore, rating, date, user, reviewid, url)

            writer.writerow({'title': reviewobj.title, 'text': reviewobj.text, 'starRating': reviewobj.rating,
                         'helpfulScore': reviewobj.helpfulscore, 'date': reviewobj.date, 'user': reviewobj.user,
                         'id': reviewobj.reviewid, 'url': reviewobj.url})

在收到每个评论页面后调用此函数。 所以这可能不是最聪明/最简单的方法,但它有效。 问题是,在第2次,第3次......时间调用此代码时,附加部分无法按预期工作,因为先前迭代中附加的所有行都会被覆盖。列标题仍然存在。

我想要的示例:(以','分隔的列)

title, user, id
title1, user1, id1
title2, user2, id2
title3, user3, id3

第二次迭代后我得到什么的示例:

title, user, id
title2, user2, id2  # row 1 is missing...

第3次迭代后我得到什么的示例:

title, user, id
title3, user3, id3  # rows 1 & 2 are missing...

我做错了什么?

1 个答案:

答案 0 :(得分:1)

如果没有整个代码,并且不知道你如何调用该代码,就不可能确切地知道出了什么问题 - 但是你显然正在调用"创建&准备列标题"部分代码不止一次,因为以下工作符合预期:

bruno@bigb:~/Work/playground$ cat appcsv.py
import csv

with open('review-raw-data.csv', 'wb') as output:
    fieldnames = ['a', 'b', 'c']
    writer = csv.DictWriter(output, delimiter=',', fieldnames=fieldnames, quoting=csv.QUOTE_ALL, restval='unknown', extrasaction='ignore')
    writer.writeheader()


def extract(data):
    with open('review-raw-data.csv', 'ab') as output:
        fieldnames = ['a', 'b', 'c']
        writer = csv.DictWriter(output, delimiter=',', fieldnames=fieldnames, quoting=csv.QUOTE_ALL, restval='unknown', extrasaction='ignore')
        for row in data:
            writer.writerow(dict(zip(fieldnames, row)))


dataset = [
    [(1, 2, 3), (4, 5, 6)],
    [(5, 6, 7),]
    ]

for data in dataset:
    extract(data)


bruno@bigb:~/Work/playground$ python appcsv.py
bruno@bigb:~/Work/playground$ cat review-raw-data.csv 
"a","b","c"
"1","2","3"
"4","5","6"
"5","6","7"

现在很容易避免覆盖现有文件:只需在打开它之前检查它是否存在:

import os

filename = 'review-raw-data.csv'
flag = "ab" if os.path.exists(filename) else "wb"
with open(filename, flag) as output:
   # etc

作为旁注:您有很多重复的代码(fieldnames定义,打开文件并创建DictWriter)。你应该在函数中考虑这个因素,和/或只做一次这样的事情并将作者传递给extract

def get_writer(outfile):
    fieldnames = [# etc ]
    writer = csv.DictWriter(outfile, delimiter=',', fieldnames=fieldnames, quoting=csv.QUOTE_ALL, restval='unknown', extrasaction='ignore')

def extract(data, writer):
    for review in data:
        # extraction happening...
        reviewobj = Review(title, text, helpfulscore, rating, date, user, reviewid, url)
        writer.writerow({
           'title': reviewobj.title, 'text': reviewobj.text, 
           'starRating': reviewobj.rating, 
           'helpfulScore': reviewobj.helpfulscore, 
           'date': reviewobj.date, 'user': reviewobj.user,
           'id': reviewobj.reviewid, 'url': reviewobj.url
            })

def main():
    filename = 'review-raw-data.csv'
    exists = os.path.exists(filename)
    flag = "ab" if exists else "wb"
    with open(filename) as outfile:
        writer = get_writer(outfile)
        if not exists:
            writer.writeheaders()
        for data in whereever_you_get_your_data_from():
             extract(data, writer)