连接里面有多个单词的文件

时间:2017-10-27 14:28:30

标签: python python-2.7

我目前是Python新手,我想做以下事情:

text1.txt,我会:

1
2
7
9

text2.txt,我会:

1
2
2
2
3
4

我正在寻找一个解决方案,我的输出将按此特定顺序查看:

1
2
7
9
2
2
3
4

首先,我想比较这两个文件,然后得到一个连接文件,但如果其中一个文件有多个单词,这在两个文件中都很常见,我希望在输出中,这应该是也是一个文件,它具有与文件中一样多次,具有最多相同的重复错误。在这种情况下,file2有3次2file1有1次2,所以我希望在输出中有3次2,但最后2的额外次数。此外,应在输出的末尾添加两个文件之间不常见的内容。 我开始这样,直到现在比较这两个文件并连接它们,但我不知道如何添加找到的多个常用单词:

import glob
read_files=glob.glob("1.txt,output1.txt")
file1=open('1.txt','r')
file3=open('3.txt','r')
with open('3.txt', 'r') as file3:
    with open('1.txt', 'r') as file1:
        same1 = set(file3).difference(file1)

same1.discard('\n')

with open('output1.txt', 'w') as file_out:
    for line in same1:
        file_out.write(line)

filename=['output1.txt','1.txt']
with open('output_final.txt', 'w') as outfile:
    for fname in filename:
           with open(fname) as infile:
                    for line in infile:
                outfile.write(line)

有可能从文件名列表中选择1个文件吗?有点像filename(1)只处理第一个文件或第二个文件? 解释起来相当棘手,但我认为这个例子可以比我的解释更好地用作参考。

2 个答案:

答案 0 :(得分:0)

我认为您可以将这两个文件作为列表加载(我假设它们不是很大)然后迭代第一个并从第二个中删除这些元素然后连接列表。类似的东西:

with open('text1.txt', 'r') as f:
    li1 = f.readlines()
with open('text2.txt', 'r') as f:
    li2 = f.readlines()

li1 = list(map(lambda t: int(t.strip()), li1))
li2 = list(map(lambda t: int(t.strip()), li2))

for i in li1:
    try:
        li2.remove(i)
    except ValueErorr:
        pass

li1.extend(li2)

li1现在应该有所需的输出:

In [27]: print(li1)
[1, 2, 7, 9, 2, 2, 3, 4]

答案 1 :(得分:0)

如果你需要这种代码,这里是:

import fileinput
import collections
import re
import csv
import xlrd

file1=open('1.txt','r')
file3=open('3.txt','r')
with open('3.txt', 'r') as file3:
    with open('1.txt', 'r') as file1:
        diff = set(file3).difference(file1)

diff.discard('\n')

with open('difference.txt', 'w') as file_out:
    for line in diff:
        file_out.write(line)

finput = fileinput.FileInput('1.txt')
finput.next()
ginput = fileinput.FileInput('3.txt')
ginput.next()   

lines1=open('1.txt','r').read().splitlines()
lines2=open('3.txt','r').read().splitlines()

with open('duplicate.txt','w') as gout:
    count1 = collections.Counter(lines1)
    count2 = collections.Counter(lines2)

    final_lines = lines1
    seen = set()

    for line in lines2:
        if line not in seen:
            seen.add(line)
            if count1[line] < count2[line]:
                final_lines += ([line] * (count2[line] - count1[line]))

    for line in final_lines:
        gout.write("%s\n" % line)