将单列文本文件合并为一个多列文件,由制表符分隔

时间:2016-05-04 18:01:38

标签: python text-files glob

我有几个单列文本文件,我想把它们放在一起(例如每个文件是一列)。我能找到的唯一文档就是将文件附加到其他文件的底部。那就是我被困住的地方。

import glob
MA_files = glob.glob("MA*continuous*2")

with open("MA_continuous_results.csv", "wb") as outfile:
    for eachFile in MA_files:
        with open(eachFile, "rb") as infile:
            eachFile=eachFile.split("maskave_")
            # eachFile[-1] = 15_2 etc
            outfile.write('%s\n' % eachFile[-1])
            for line in infile:
                #split by [ because text after that, files give unnecessary info
                line=line.split('[')
                #only write info before [
                outfile.writelines('%s\n' % line[0])

输出如下:

15_2
-0.0383935 
-0.0559652 
-0.0168811 
-0.0996374 
-0.151921 
-0.131327 
-0.0509602 
-0.109181 
-0.0238667 
-0.00646939 
-0.106631 
-0.0380114 
-0.0219288 
-0.0135917 
-0.0627647 
-0.0605226 
-0.0534139 
-0.0134063 
21_2
-0.097086 
-0.210296 
0.0639971 
0.00949209 
-0.227474 
0.0180759 
-0.135376 
-0.212909 
-0.00786295 
-0.00922367 
-0.0749584 
-0.0584701 
-0.019548 
-0.0984993 
-0.00848889 
-0.164244 
-0.0121499 
0.0100612 

但我希望它看起来像这样(用制表符或逗号分隔的列):

 15_2       21_2
-0.0383935  -0.097086
-0.0559652  -0.210296
-0.0168811  0.0639971
-0.0996374  0.00949209
-0.151921   -0.227474
-0.131327   0.0180759
-0.0509602  -0.135376
-0.109181   -0.212909
-0.0238667  -0.00786295
-0.00646939 -0.00922367
-0.106631   -0.0749584
-0.0380114  -0.0584701
-0.0219288  -0.019548
-0.0135917  -0.0984993
-0.0627647  -0.00848889
-0.0605226  -0.164244
-0.0534139  -0.0121499
-0.0134063  0.0100612

我该怎么做?

示例文件名:

MA_continuous_hybrid_maskave_19_2 MA_continuous_hybrid_maskave_18_2

示例内容:

-0.182682 [344 voxels]
-0.0631301 [344 voxels]
-0.0101798 [344 voxels]
-0.121342 [344 voxels]
-0.547331 [344 voxels]
-0.0582418 [344 voxels]
-0.284454 [344 voxels]
-0.262656 [344 voxels]
-0.123836 [344 voxels]
-0.0371469 [344 voxels]
-0.265201 [344 voxels]
-0.147427 [344 voxels]
-0.34516 [344 voxels]
-0.0431832 [344 voxels]
-0.0171557 [344 voxels]
-0.14525 [344 voxels]
-0.0864529 [344 voxels]
0.0881003 [344 voxels]

2 个答案:

答案 0 :(得分:2)

我会按如下方式重写您的代码:

import glob
from itertools import izip

def extract_meaningful_info(line):
    return line.rstrip('\n').split('[')[0]

MA_files = glob.glob("MA*continuous*2")

with open("MA_continuous_results.csv", "wb") as outfile:
    outfile.write("\t".join(MA_files) + '\n')
    for fields in izip(*(open(f) for f in MA_files)):
        fields = [extract_meaningful_info(f) for f in fields]
        outfile.write('\t'.join(fields) + '\n')

(代码用于python2)

您可能需要阅读:

答案 1 :(得分:1)

这是一个以增量方式执行此操作的好方法,并确保所有涉及的文件在结束时关闭:

from contextlib import contextmanager
import glob
from itertools import izip

@contextmanager
def multi_file_manager(files, mode='rb'):
    """ Open multiple files and make sure they all get closed. """
    files = [open(file, mode) for file in files]
    yield files
    for file in files:
        file.close()

def read_info(file):
    """ Generator function to read and extract info from each line of a file. """
    for line in file:
        yield line.split('[')[0]

with open("MA_continuous_results.csv", "wb") as outfile:
    MA_files = glob.glob("MA*continuous*2")

    col_headers = (filename.split("maskave_")[-1] for filename in MA_files)
    outfile.write('\t'.join(col_headers) + '\n')

    with multi_file_manager(MA_files) as infiles:
        generators = [read_info(file) for file in infiles]

        for fields in izip(*generators):
            outfile.write('\t'.join(fields) + '\n')