我想将各种csv文件的列组合成一个csv文件,并带有一个新的标题,水平连接。我想只选择标题选择的某些列。每个要合并的文件中都有不同的列。
示例输入:
freestream.csv:
static pressure,static temperature,relative Mach number
1.01e5,288,5.00e-02
fan.csv:
static pressure,static temperature,mass flow
0.9e5,301,72.9
exhaust.csv:
static pressure,static temperature,mass flow
1.7e5,432,73.1
期望的输出:
combined.csv:
P_amb,M0,Ps_fan,W_fan,W_exh
1.01e5,5.00e-02,0.9e6,72.9,73.1
可能调用该函数:
reorder_multiple_CSVs(["freestream.csv","fan.csv","exhaust.csv"],
"combined.csv",["static pressure,relative Mach number",
"static pressure,mass flow","mass flow"],
"P_amb,M0,Ps_fan,W_fan,W_exh")
这是代码的先前版本,只允许一个输入文件。我是在write CSV columns out in a different order in Python的帮助下写的:
def reorder_CSV(infilename,outfilename,oldheadings,newheadings):
with open(infilename) as infile:
with open(outfilename,'w') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
readnames = reader.next()
name2index = dict((name,index) for index, name in enumerate(readnames))
writeindices = [name2index[name] for name in oldheadings.split(",")]
reorderfunc = operator.itemgetter(*writeindices)
writer.writerow(newheadings.split(","))
for row in reader:
towrite = reorderfunc(row)
if isinstance(towrite,str):
writer.writerow([towrite])
else:
writer.writerow(towrite)
所以我想弄清楚,为了使其适应多个文件,我是:
- 我现在需要infilename,oldheadings和newheadings列表(所有长度相同)
- 我需要遍历输入文件列表以制作读者列表
-readnames也可以是一个列表,迭代读者
- 这意味着我可以将name2index设为字典列表
我不知道怎么做的一件事是使用关键字with
,嵌套的n级深度,当n只在运行时知道。我读到了这个:How can I open multiple files using "with open" in Python?但这似乎只有在您知道需要打开多少文件时才有效。
或者有更好的方法吗?
我对python很新,所以我感谢你能给我的任何提示。
答案 0 :(得分:2)
我只回复有关使用with
打开多个文件的部分,其中之前的文件数量未知。编写自己的contextmanager,这样的事情(完全未经测试)不应该太难:
from contextlib import contextmanager
@contextmanager
def open_many_files(filenames):
files = [open(filename) for filename in filenames]
try:
yield files
finally:
for f in files:
f.close()
您可以这样使用:
innames = ['file1.csv', 'file2.csv', 'file3.csv']
outname = 'out.csv'
with open_many(innames) as infiles, open(outname, 'w') as outfile:
for infile in infiles:
do_stuff(in_file)
还有一个does something similar函数,但不推荐使用。
答案 1 :(得分:0)
我不确定这是否是正确的方法,但我想扩展Bas Swinckels答案。他在非常有帮助的答案中有一些小的不一致之处,我想给出相关代码。
这就是我所做的,并且有效。
from contextlib import contextmanager
import csv
import operator
import itertools as IT
@contextmanager
def open_many_files(filenames):
files=[open(filename,'r') for filename in filenames]
try:
yield files
finally:
for f in files:
f.close()
def reorder_multiple_CSV(infilenames,outfilename,oldheadings,newheadings):
with open_many_files(filter(None,infilenames.split(','))) as handles:
with open(outfilename,'w') as outfile:
readers=[csv.reader(f) for f in handles]
writer = csv.writer(outfile)
reorderfunc=[]
for i, reader in enumerate(readers):
readnames = reader.next()
name2index = dict((name,index) for index, name in enumerate(readnames))
writeindices = [name2index[name] for name in filter(None,oldheadings[i].split(","))]
reorderfunc.append(operator.itemgetter(*writeindices))
writer.writerow(filter(None,newheadings.split(",")))
for rows in IT.izip_longest(*readers,fillvalue=['']*2):
towrite=[]
for i, row in enumerate(rows):
towrite.extend(reorderfunc[i](row))
if isinstance(towrite,str):
writer.writerow([towrite])
else:
writer.writerow(towrite)