我目前是Python新手,我想做以下事情:
在text1.txt
,我会:
1
2
7
9
在text2.txt
,我会:
1
2
2
2
3
4
我正在寻找一个解决方案,我的输出将按此特定顺序查看:
1
2
7
9
2
2
3
4
首先,我想比较这两个文件,然后得到一个连接文件,但如果其中一个文件有多个单词,这在两个文件中都很常见,我希望在输出中,这应该是也是一个文件,它具有与文件中一样多次,具有最多相同的重复错误。在这种情况下,file2
有3次2
而file1
有1次2
,所以我希望在输出中有3次2
,但最后2
的额外次数。此外,应在输出的末尾添加两个文件之间不常见的内容。
我开始这样,直到现在比较这两个文件并连接它们,但我不知道如何添加找到的多个常用单词:
import glob
read_files=glob.glob("1.txt,output1.txt")
file1=open('1.txt','r')
file3=open('3.txt','r')
with open('3.txt', 'r') as file3:
with open('1.txt', 'r') as file1:
same1 = set(file3).difference(file1)
same1.discard('\n')
with open('output1.txt', 'w') as file_out:
for line in same1:
file_out.write(line)
filename=['output1.txt','1.txt']
with open('output_final.txt', 'w') as outfile:
for fname in filename:
with open(fname) as infile:
for line in infile:
outfile.write(line)
有可能从文件名列表中选择1个文件吗?有点像filename(1)只处理第一个文件或第二个文件? 解释起来相当棘手,但我认为这个例子可以比我的解释更好地用作参考。
答案 0 :(得分:0)
我认为您可以将这两个文件作为列表加载(我假设它们不是很大)然后迭代第一个并从第二个中删除这些元素然后连接列表。类似的东西:
with open('text1.txt', 'r') as f:
li1 = f.readlines()
with open('text2.txt', 'r') as f:
li2 = f.readlines()
li1 = list(map(lambda t: int(t.strip()), li1))
li2 = list(map(lambda t: int(t.strip()), li2))
for i in li1:
try:
li2.remove(i)
except ValueErorr:
pass
li1.extend(li2)
li1
现在应该有所需的输出:
In [27]: print(li1)
[1, 2, 7, 9, 2, 2, 3, 4]
答案 1 :(得分:0)
如果你需要这种代码,这里是:
import fileinput
import collections
import re
import csv
import xlrd
file1=open('1.txt','r')
file3=open('3.txt','r')
with open('3.txt', 'r') as file3:
with open('1.txt', 'r') as file1:
diff = set(file3).difference(file1)
diff.discard('\n')
with open('difference.txt', 'w') as file_out:
for line in diff:
file_out.write(line)
finput = fileinput.FileInput('1.txt')
finput.next()
ginput = fileinput.FileInput('3.txt')
ginput.next()
lines1=open('1.txt','r').read().splitlines()
lines2=open('3.txt','r').read().splitlines()
with open('duplicate.txt','w') as gout:
count1 = collections.Counter(lines1)
count2 = collections.Counter(lines2)
final_lines = lines1
seen = set()
for line in lines2:
if line not in seen:
seen.add(line)
if count1[line] < count2[line]:
final_lines += ([line] * (count2[line] - count1[line]))
for line in final_lines:
gout.write("%s\n" % line)