合并文件但仅在标题行上输出

时间:2016-12-30 13:46:03

标签: python merge

我之前看过一些以前的帖子,这些帖子都有适用于他人的解决方案,但由于某些原因,我并没有为我工作。

我试图编写一个python脚本来1)合并三个具有相同格式的文件,2)仅删除重复的标题,3)按Specimen_ID对行进行排序,并且4)添加2每个唯一Specimen_ID之间的新空行(即,除了第一个实例之外,每三行由于标题需要前4行)。

我有一部分适用于前两步和最后一步的脚本:

import glob

read_files = glob.glob("*.txt")

header_saved = False
linecnt=0
with open("merged_data.txt", "wb") as outfile:
    for f in read_files:
        with open(f, "rb") as infile:
            header = next(infile)
            if not header_saved:
                outfile.write(header)
                header_saved = True
            for line in infile:
                outfile.write(line)
                linecnt=linecnt+1
                if (linecnt%3)==0:
                    outfile.write("\n\n")

有关排序行的任何建议吗?此外,如果数据是以制表符分隔的txt文件从Excel中导出的,我发现此脚本只会导致包含第一个infile内容的输出,而不会导致其他内容。如果我只是将数据复制并粘贴到新的txt文件中并将其用作infiles,我就没有问题。有谁知道为什么我会遇到这个问题?

示例输入文件文本(infile 1):

Specimen_ID Measured_by_initals Measure_date    Sex Beak_length Pronotal_width  Right_fore_femur_length Right_fore_femur_width  Left_fore_femur_length  Left_fore_femur_width   Right_hind_femur_length Right_hind_femur_width  Left_hind_femur_length  Left_hind_femur_width   Right_hind_femur_area   Left_hind_femur_area    Right_hind_tibia_width  Left_hind_tibia_width   Notes
a   1   30-Dec-16   M   4   4   4   4   4   4   4   4   4   4   4   4   4   4   
b   1   30-Dec-16   F   4   4   4   4   4   4   4   4   4   4   4   4   4   4   beak bent
c   1   30-Dec-16   M   4   4   4   4   4   4   4   4   4   4   4   4   4   4   
d   1   30-Dec-16   F   4   4   4   4   4   4   4   4   4   4   4   4   4   4   
e   1   30-Dec-16   F   4   4   4   4   4   4   4   4   4   4   4   4   4   4   pronotum deformed
f   1   30-Dec-16   F   4   4   4   4   4   4   4   4   4   4   4   4   4   4   

示例输入文件文本(infile 2):

Specimen_ID Measured_by_initals Measure_date    Sex Beak_length Pronotal_width  Right_fore_femur_length Right_fore_femur_width  Left_fore_femur_length  Left_fore_femur_width   Right_hind_femur_length Right_hind_femur_width  Left_hind_femur_length  Left_hind_femur_width   Right_hind_femur_area   Left_hind_femur_area    Right_hind_tibia_width  Left_hind_tibia_width   Notes
a   2   30-Dec-16   M   4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
b   2   30-Dec-16   F   4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
c   2   30-Dec-16   M   4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
d   2   30-Dec-16   F   4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
e   2   30-Dec-16   F   4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 
f   2   30-Dec-16   F   4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 

1 个答案:

答案 0 :(得分:0)

除非文件中有一些意外的数据,否则你的解决方案应该是完美的。我刚刚添加了第3个项目的代码

read_files = glob.glob("*.txt")

header_saved = False
linecnt=0
with open("merged_data.txt", "wb") as outfile:
    for f in read_files:
        with open(f, "rb") as infile:
            header = next(infile)
            if not header_saved:
                outfile.write(header)
                header_saved = True
            for line in infile:
                outfile.write(line)
                linecnt=linecnt+1
                if (linecnt%3)==0:
                    outfile.write("\n\n")

inputfile1.txt

Employee,Account,Currency,Amount,Location
Test 1,  Basic,USD,3000,Airport
Test 2,  Net, USD,2000,Airport
Test 3,  Basic,USD,4000,Town
Test 4,  Net, USD,3000,Town
Test 5,  Basic,GBP,5000,Town
Test 6,  Net, GBP,4000,Town

inputfile2.txt

Employee,Account,Currency,Amount,Location
Test 8,  Basic,USD,3000,Airport
Test 9,  Net, USD,2000,Airport
Test 10,  Basic,USD,4000,Town
Test 11,  Net, USD,3000,Town
Test 12,  Basic,GBP,5000,Town
Test 13,  Net, GBP,4000,Town

输出

Employee,Account,Currency,Amount,Location
Test 1,  Basic,USD,3000,Airport
Test 2,  Net, USD,2000,Airport
Test 3,  Basic,USD,4000,Town


Test 4,  Net, USD,3000,Town
Test 5,  Basic,GBP,5000,Town
Test 6,  Net, GBP,4000,Town

Test 8,  Basic,USD,3000,Airport
Test 9,  Net, USD,2000,Airport
Test 10,  Basic,USD,4000,Town


Test 11,  Net, USD,3000,Town
Test 12,  Basic,GBP,5000,Town
Test 13,  Net, GBP,4000,Town