Question

所以我的CSV_Output文件是空的，虽然我没有收到任何错误。我正在尝试从CSV_to_Read文件中再添加一列。 article.cleaned_text的印刷品有效。所以我觉得我只是在做一些愚蠢的事情。谢谢！

from csv import reader, writer
import unicodecsv as csv
from goose import Goose

with open('CSV_to_Read.csv','r') as csvfile:
    readCSV = csv.reader(csvfile, encoding='utf-8')
    out = writer(open("CSV_Output.csv", "a"))
    for row in readCSV:
        g = Goose({'browser_user_agent': 'Mozilla', 'parser_class':'soup'})
        try:
            article = g.extract(url=row[0])
            print article.cleaned_text
            out.writerow([row[0], row[1], row[2], row[3], row[4], row[5], row[6], article.cleaned_text, row[7], row[8], row[9]])
        except Exception:
            pass

Answer 1

在这里打开输出文件的文件对象，但不要关闭它。

out = writer(open("CSV_Output.csv", "a"))

数据可能已缓冲且尚未刷新到磁盘。避免此错误的一种方法是确保关闭文件对象。后者由文件对象上下文管理器（即with open(path) as file:语法）处理。

因此，我建议您将代码更改为：

with open('CSV_to_Read.csv','r') as csvfile:
    readCSV = csv.reader(csvfile, encoding='utf-8')
    with open("CSV_Output.csv", "a") as outfile:
        out = writer(outfile)
        for row in readCSV:
            g = Goose({'browser_user_agent': 'Mozilla', 'parser_class':'soup'})
            try:
                article = g.extract(url=row[0])
                print article.cleaned_text
                out.writerow([row[0], row[1], row[2], row[3], row[4], row[5], row[6], article.cleaned_text, row[7], row[8], row[9]])
            except Exception:
                pass

将CSV文件读入另一个不保存的CSV文件

1 个答案: