我有超过200个文件,我想用列clName值除以保留所有文件中的标题。我还想用OriginalFileName-clName.txt保存这些文件
ID Plate Well ctr clID clName
21 5 C03 1 50012 COL
21 5 C03 1 50012 COL
21 5 C03 1 50012 COL
21 5 C04 1 50012 IA
21 5 C04 1 50012 IA
21 5 C05 1 50012 ABC
import csv
from itertools import groupby
for key, rows in groupby(csv.reader(open("file.csv")),
lambda row: row[7]):
with open("%s.txt" % key, "w") as output:
for row in rows:
output.write(",".join(row) + "\n")
我遇到的问题是列不会一直被称为clName,它可以被称为clName,cll_n,c_Name。有时这将是第7列,其他时间是第5或第9列。
我所知道的是将文件与列值分开但不保留标题,我必须检查每个文件以查找其列,7,9等等。
我有没有办法从名单列表中检查列名,当找到其中一个名称时,按该列值拆分文件?
示例数据 https://drive.google.com/file/d/0Bzv1SNKM1p4uell3UVlQb0U3ZGM/view?usp=sharing
谢谢
答案 0 :(得分:2)
改为使用csv.DictReader
和csv.DictWriter
。这是一个应该指向正确方向的大纲。
special_col = ['cll_n', 'clName']
with open('myfile.csv', 'r') as fh:
rdr = csv.DictReader(fh)
# now we need to figure out which column is used
for c in special_col:
if c in rdr.fieldnames:
break # found the column name
else:
raise IOError('No special column in file')
# now execute your existing code, but group by the
# column using lambda row: row[c] instead of row 7
call_existing_code(rdr, c)
def call_existing_code(rdr, c):
# set up an output file using csv.DictWriter; you can
# replace the original column with the new column, and
# control the order of fields
with open('output.csv', 'w') as fh:
wtr = csv.DictWriter(fh, fieldnames=['list', 'of', 'fields'])
wtr.writeheader()
for row in groupby(rdr, lambda r: r[c]):
# [process the row as needed here]
wtr.writerow(row)