尝试合并CSV中的三列,更新原始CSV

时间:2013-10-19 01:26:55

标签: python python-2.7 csv

一些示例数据:

title1|title2|title3|title4|merge
test|data|here|and
test|data|343|AND
",3|data|343|and

我尝试编码:

import csv
import StringIO

storedoutput = StringIO.StringIO()
fields = ('title1', 'title2', 'title3', 'title4', 'merge')
with open('file.csv', 'rb') as input_csv:
    reader = csv.DictReader(input_csv, fields, delimiter='|')
    for counter, row in enumerate(reader):
        counter += 1
        #print row
        if counter != 1:
            for field in fields:
                if field == "merge":
                    row['merge'] = ("%s%s%s" % (row["title1"], row["title3"], row["title4"]))
                    print row
                    storedoutput.writelines(','.join(map(str, row)) + '\n')

contents = storedoutput.getvalue()
storedoutput.close()

print "".join(contents)

with open('file.csv', 'rb') as input_csv:
    input_csv = input_csv.read().strip()

output_csv = []
output_csv.append(contents.strip())

if "".join(output_csv) != input_csv:
    with open('file.csv', 'wb') as new_csv:
        new_csv.write("".join(output_csv))

输出应为

title1|title2|title3|title4|merge
test|data|here|and|testhereand
test|data|343|AND|test343AND
",3|data|343|and|",3343and

在运行此代码的第一个打印时,为了您的参考,它打印行,因为我希望它出现在输出csv中。但是,第二个打印打印标题行x次,其中x是行数。

任何输入或更正或工作代码都将受到赞赏。

3 个答案:

答案 0 :(得分:2)

我认为我们可以使这个很多更简单。我承认,处理流氓"有点令人讨厌,因为你必须努力告诉Python你不想担心它。

import csv

with open('file.csv', 'rb') as input_csv, open("new_file.csv", "wb") as output_csv:
    reader = csv.DictReader(input_csv, delimiter='|', quoting=csv.QUOTE_NONE)
    writer = csv.DictWriter(output_csv, reader.fieldnames, delimiter="|",quoting=csv.QUOTE_NONE, quotechar=None)

    merge_cols = "title1", "title3", "title4"

    writer.writeheader()

    for row in reader:
        row["merge"] = ''.join(row[col] for col in merge_cols)
        writer.writerow(row)

产生

$ cat new_file.csv 
title1|title2|title3|title4|merge
test|data|here|and|testhereand
test|data|343|AND|test343AND
",3|data|343|and|",3343and

请注意,即使您希望更新原始文件,我也拒绝了。为什么?这是一个坏主意,因为这样你就可以在处理它时破坏你的数据。

我怎么能这么肯定?因为这正是我第一次运行代码时所做的,而且我知道的更好。 ; ^)

答案 1 :(得分:2)

最后一行的双引号肯定搞乱了csv.DictReader()。 这有效:

new_lines = []
with open('file.csv', 'rb') as f:
    # skip the first line
    new_lines.append(f.next().strip())
    for line in f:
        # the newline and split the fields
        line = line.strip().split('|')
        # exctract the field data you want
        title1, title3, title4 = line[0], line[2], line[3]
        # turn the field data into a string and append in to the rest
        line.append(''.join([title1, title3, title4]))
        # save the new line for later
        new_lines.append('|'.join(line))

with open('file.csv', 'w') as f:
    # make one long string and write it to the new file
    f.write('\n'.join(new_lines))

答案 2 :(得分:0)

import csv
import StringIO

stored_output = StringIO.StringIO()

with open('file.csv', 'rb') as input_csv:
    reader = csv.DictReader(input_csv, delimiter='|', quoting=csv.QUOTE_NONE)
    writer = csv.DictWriter(stored_output, reader.fieldnames, delimiter="|",quoting=csv.QUOTE_NONE, quotechar=None)

    merge_cols = "title1", "title3", "title4"

    writer.writeheader()

    for row in reader:
        row["merge"] = ''.join(row[col] for col in merge_cols)
        writer.writerow(row)

    contents = stored_output.getvalue()
    stored_output.close()
    print contents

with open('file.csv', 'rb') as input_csv:
    input_csv = input_csv.read().strip()

if input_csv != contents.strip():
    with open('file.csv', 'wb') as new_csv:
        new_csv.write("".join(contents))