按列值

时间:2016-04-02 23:45:36

标签: python csv

我有超过200个文件,我想用列clName值除以保留所有文件中的标题。我还想用OriginalFileName-clName.txt保存这些文件

ID  Plate   Well      ctr        clID     clName
21    5      C03        1       50012       COL
21    5      C03        1       50012       COL
21    5      C03        1       50012       COL 
21    5      C04        1       50012       IA 
21    5      C04        1       50012       IA 
21    5      C05        1       50012       ABC 


import csv
from itertools import groupby

for key, rows in groupby(csv.reader(open("file.csv")),
                         lambda row: row[7]):
    with open("%s.txt" % key, "w") as output:
        for row in rows:
            output.write(",".join(row) + "\n")

我遇到的问题是列不会一直被称为clName,它可以被称为clName,cll_n,c_Name。有时这将是第7列,其他时间是第5或第9列。

我所知道的是将文件与列值分开但不保留标题,我必须检查每个文件以查找其列,7,9等等。

我有没有办法从名单列表中检查列名,当找到其中一个名称时,按该列值拆分文件?

示例数据 https://drive.google.com/file/d/0Bzv1SNKM1p4uell3UVlQb0U3ZGM/view?usp=sharing

谢谢

1 个答案:

答案 0 :(得分:2)

改为使用csv.DictReadercsv.DictWriter。这是一个应该指向正确方向的大纲。

special_col = ['cll_n', 'clName']

with open('myfile.csv', 'r') as fh:
    rdr = csv.DictReader(fh)

    # now we need to figure out which column is used
    for c in special_col:
        if c in rdr.fieldnames:
            break  # found the column name
    else:
        raise IOError('No special column in file')

    # now execute your existing code, but group by the
    # column using lambda row: row[c] instead of row 7
    call_existing_code(rdr, c)


def call_existing_code(rdr, c):
    # set up an output file using csv.DictWriter; you can
    # replace the original column with the new column, and
    # control the order of fields

    with open('output.csv', 'w') as fh:
        wtr = csv.DictWriter(fh, fieldnames=['list', 'of', 'fields'])
        wtr.writeheader()

        for row in groupby(rdr, lambda r: r[c]):

            # [process the row as needed here]

            wtr.writerow(row)