如何从另一个文件添加具有匹配字符串的新列

时间:2017-04-27 19:11:44

标签: python

我有两个文件

文件1:

chrom start     end       strand somecol1...somecol10
11  98566330    98566433    -
11  98566295    98566433    -
11  98566581    98566836    -

file2的

chrom   start   end      strand  gene_id            gene_name
11  98566330    98566433    -   ENSMUSG00000017210  Med24
11  98566295    98566433    -   ENSMUSG00000017210  Med24
11  98566581    98566836    -   ENSMUSG00000017210  Med24

期望的输出

chrom start     end       strand gene_id gene_namesomecol1...somecol10
11  98566330    98566433    -   ENSMUSG00000017210 Med24
11  98566295    98566433    -   ENSMUSG00000017210 Med24
11  98566581    98566836    -   ENSMUSG00000017210 Med24

如何在我的file1中插入具有匹配字符串值的新列,而不更改其他列中的结构或元素(来自somecol1 ... somecol10)

1 个答案:

答案 0 :(得分:1)

如果将输出放在不同的文件中不是问题,你可以这样做:

with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2, open('file3.txt', 'w') as f3:
        lines2 = f2.readlines()
        for idx, line1 in enumerate(f1.readlines()):
                # for each line in f1, we get the same line from f2
                line2 = lines2[idx]
                # compare that the first 4 columns are equal
                if line1.split()[:4] == line2.split()[:4]:
                        # if so, combine the data and save it to file3. The format in which I wrote the data to file3 is irrelevant.
                        f3.write(line2.strip() + '\t' + '\t'.join(line1.split()[4:]) + '\n')