比较两个文本文件并保存输出匹配

时间:2014-03-01 19:29:59

标签: python file file-io

我有两个文本文件,我想比较它们并将匹配的列保存到新文本中 文件。

文件1:

114.74721

114.85107

2.96667

306.61756

file2的:

115.06603 0.00294 5.90000

114.74721 0.00674 5.40000

114.85107 0.00453 6.20000

111.17744 0.00421 5.50000

192.77787 0.03080 3.20000

189.70226 0.01120 5.00000

0.46762 0.00883 3.70000

2.21539 0.01290 3.50000

2.96667 0.01000 3.60000

5.43310 0.00393 5.50000

0.28537 0.00497 5.10000

308.82348 0.00183 6.60000

306.61756 0.00359 5.20000

我希望输出为:

114.74721 0.00674 5.40000

114.85107 0.00453 6.20000

2.96667 0.01000 3.60000

306.61756 0.00359 5.20000

我使用了一个脚本,但是有一些错误,因为输出文件比file1更多,它应该是相同的。你能帮帮我吗?

file1=open("file1.txt","r")
file2=open("file2.txt","r")
file3=open("output.txt","w")
  for line1 in file1.readlines():
    file2.seek(0)
    for line2 in file2.readlines():
      if line1.strip() in line2:
        file3.writerow(line2)

修改

来自file1.txt

114.74721

114.85107

2.96667

306.61756

152.70581

150.04497

91.41869

91.41869

91.73398

92.35076

117.68963

117.69291

115.97827

168.14476

169.94404

73.00571

156.02833

156.02833

来自file3.txt

114.74721 0.00674 5.40000

114.85107 0.00453 6.20000

2.96667 0.01000 3.60000

306.61756 0.00359 5.20000

152.70581 0.02780 2.70000

150.04497 0.00211 6.00000

91.41869 0.00500 3.70000

91.73398 0.00393 4.30000

92.35076 0.00176 5.80000

117.68963 0.15500 2.20000

117.69291 0.15100 2.50000

115.97827 0.00722 7.80000

168.14476 0.00383 5.50000

169.94404 0.00539 4.80000

73.00571 0.00876 3.80000

156.02833 0.00284 6.30000

156.64645 0.01290 3.50000

156.65070 0.02110 4.40000

如果您看到第7行和第8行在file1.txt中具有相同的值91.41869,但在file3.txt中它只提到第7行而不是第8行。第17和18行也是如此。

1 个答案:

答案 0 :(得分:0)

FILE1 = "file1.txt"
FILE2 = "file2.txt"
OUTPUT = "file3.txt"

with open(FILE1) as inf:
    match = set(line.strip() for line in inf)

with open(FILE2) as inf, open(OUTPUT, "w") as outf:
    for line in inf:
        if line.split(' ', 1)[0] in match:
            outf.write(line)

或者,如果他们必须处于相同的顺序,

with open(FILE1) as inf:
    items = [line.strip() for line in inf]
    match = {val:i for i,val in enumerate(items)}
    outp  = ['\n'] * len(items)

with open(FILE2) as inf, open(OUTPUT, "w") as outf:
    for line in inf:
        val = line.split(' ', 1)[0]
        try:
            outp[match[val]] = line
        except KeyError:
            pass
    outf.write(''.join(outp))

请注意,第一个版本会写出尽可能多的匹配 - 如果FILE2中的两行以“114.74721”开头,您将同时获得它们 - 而第二个只会保留最后一个匹配。