Question

我有6个类似格式但名称不同的文件。（例如，file_AA.dat file_AB.dat file_AC.dat file_BA.dat file_BB.dat file_BC.dat）

我是否可以编写一个for循环脚本来一次读取，分析和打印这些文件，而不是运行脚本6次？如，

for i in {AA AB AC BA BB BC} 
 filename = 'file_$i.dat'
 file = open (filename, 'r')
 Do a lot, lot of analysis for lots of rows and columns :P 
 file open('output_file_$i.dat','w')
 Do some for loop for writing and calculation 
file.close

因此，我希望能够同时自动化读取/分析/编写不同文件（但类似格式）的过程。我很好奇如何处理输入/输出部分的命名。这样，我希望我能够更快速，更轻松地分析大量文件。

或者，有没有办法使用python和Cshell或shell脚本的混合做同样的事情？

谢谢

Answer 1

想法是迭代文件名，在循环中打开每个文件，进行分析，然后编写输出文件：

filenames = ['file_AA.dat', 'file_AB.dat', 'file_AC.dat', 'file_BA.dat', 'file_BB.dat', 'file_BC.dat']

for filename in filenames:
    with open(filename, 'r') as input_file:
        # Do a lot, lot of analysis for lots of rows and columns :P

    with open('output_%s' % filename, 'w') as output_file:
        # Do some for loop for writing and calculation

请注意，在处理文件时建议使用with statement。

另请注意，您可以将两个语句合并为一个，请参阅：

Multiple variables in Python 'with' statement

UPD：您可以使用string formatting构建文件名列表：

>>> patterns = ['AA', 'AB', 'AC', 'BA', 'BB', 'BC']
>>> filenames = ['file_{}.dat'.format(pattern) for pattern in patterns]
>>> filenames
['file_AA.dat', 'file_AB.dat', 'file_AC.dat', 'file_BA.dat', 'file_BB.dat', 'file_BC.dat']

希望有所帮助。

Answer 2

files = [
    "file_AA.dat",
    "file_AB.dat",
    "file_AC.dat",
    "file_BA.dat",
    "file_BB.dat",
    "file_BC.dat",
]
for filename in files:
    f = open(filename)
    data = f.read() #reads all data from file into a string
    #parse data here and do other stuff
    output = open("output_"+filename, 'w')
    output.write(junk) #junk is a string that you shove the results into
    output.close()

如果您有大量文件并且您正在对文件中的数据进行大量计算分析，则可以使用multiprocessing模块。至于bash vs python，我基本上使用python解释器，就像很多人使用bash shell一样，我几乎没有理由留下python解释器。此外，如果这些文件是目录中的唯一文件，则可以使用os模块遍历目录。如果必须在bash shell中运行程序，则可以使用subprocess模块。

Answer 3

您可以使用列表推导来干净利落地完成此任务：

for filein, fileout in [('file_%s.dat' % x, 'out_%s.dat' %x) for x in ('AA','AB','AC', 'BA', 'BB', 'BC')]:
    with open(filein, 'rb') as fp, open(fileout,'w') as fpout:
        # Read from fp, write to fpout as needed

此列表推导创建输入/输出文件对列表：

[('file_%s.dat' % x, 'out_%s.dat' %x) for x in ('AA','AB','AC', 'BA', 'BB', 'BC')]

这将生成一个如下所示的列表：

[('file_AA.dat', 'out_AA.dat'), ('file_AB.dat', 'out_AB.dat') ...]

您可以尝试测试其工作原理如下：

lst = [('file_%s.dat' % x, 'out_%s.dat' %x) for x in ('AA','AB','AC', 'BA', 'BB', 'BC')]:
print lst

for filein, fileout in lst:
    with open(filein, 'rb') as fp, open(fileout,'w') as fpout:
        # Read from fp, write to fpout as needed

python read - ＆gt;分析 - ＆gt;打印多个文件

3 个答案: