我有两个文件
文件1:
chrom start end strand somecol1...somecol10
11 98566330 98566433 -
11 98566295 98566433 -
11 98566581 98566836 -
file2的
chrom start end strand gene_id gene_name
11 98566330 98566433 - ENSMUSG00000017210 Med24
11 98566295 98566433 - ENSMUSG00000017210 Med24
11 98566581 98566836 - ENSMUSG00000017210 Med24
期望的输出
chrom start end strand gene_id gene_namesomecol1...somecol10
11 98566330 98566433 - ENSMUSG00000017210 Med24
11 98566295 98566433 - ENSMUSG00000017210 Med24
11 98566581 98566836 - ENSMUSG00000017210 Med24
如何在我的file1中插入具有匹配字符串值的新列,而不更改其他列中的结构或元素(来自somecol1 ... somecol10)
答案 0 :(得分:1)
如果将输出放在不同的文件中不是问题,你可以这样做:
with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2, open('file3.txt', 'w') as f3:
lines2 = f2.readlines()
for idx, line1 in enumerate(f1.readlines()):
# for each line in f1, we get the same line from f2
line2 = lines2[idx]
# compare that the first 4 columns are equal
if line1.split()[:4] == line2.split()[:4]:
# if so, combine the data and save it to file3. The format in which I wrote the data to file3 is irrelevant.
f3.write(line2.strip() + '\t' + '\t'.join(line1.split()[4:]) + '\n')