我在文件夹中有n个不同名称的文本文件,我想将文件中存在的文本相互比较,如果它们相同,则将它们保存在单独的文件夹中并从主文件夹中删除。任何人都可以帮助我吗?
到目前为止我的代码:
file1=open("F1.txt","r")
file2=open("F2.txt","r")
file3=open("F3.txt","r")
file4=open("F4.txt","r")
file5=open("F5.txt","r")
list1=file1.readlines()
list2=file2.readlines()
list3=file3.readlines()
list4=file4.readlines()
list5=file5.readlines()
for line1 in list1:
for line2 in list2:
for line3 in list3:
for line3 in list4:
for line4 in list5:
if line1.strip() in line2.strip() in line3.strip() in line4.strip() in line5.strip():
print line1
file3.write(line1)
答案 0 :(得分:0)
请参阅see if two files have the same content in python
为了进行比较,您可以使用filecmp模块(http://docs.python.org/library/filecmp.html):
>>> import filecmp
>>> filecmp.cmp('F1.txt, 'F2.txt')
True
>>> filecmp.cmp('F1.txt', 'F3.txt')
False
所以解决这个问题的一种方法是(虽然不是很优雅,但确实有效):
import filecmp
files = ['F1.txt', 'F2.txt', 'F3.txt', 'F4.txt', 'F5.txt']
comparisons = {}
for itm in range(len(files)):
try:
res = filecmp.cmp(files[itm], files[itm+1])
comparisons[str(files[itm]) + ' vs ' + str(files[itm+1])] = res
except:
pass
try:
res = filecmp.cmp(files[itm], files[itm+2])
comparisons[str(files[itm]) + ' vs ' + str(files[itm+2])] = res
except:
pass
try:
res = filecmp.cmp(files[itm], files[itm+3])
comparisons[str(files[itm]) + ' vs ' + str(files[itm+3])] = res
except:
pass
try:
res = filecmp.cmp(files[itm], files[itm+4])
comparisons[str(files[itm]) + ' vs ' + str(files[itm+4])] = res
except:
pass
print(comparisons)
给出:
{'F1.txt vs F2.txt': True, 'F1.txt vs F5.txt': False, 'F2.txt vs F4.txt': True,
'F3.txt vs F4.txt': False, 'F1.txt vs F4.txt': True, 'F2.txt vs F3.txt': False,
'F2.txt vs F5.txt': False, 'F1.txt vs F3.txt': False, 'F3.txt vs F5.txt': False,
'F4.txt vs F5.txt': False}
至于问题的其他部分,您可以使用内置的shutil
和os
模块,如下所示:
import shutil
import os
if filecmp.cmp('F1.txt', 'F2.txt') is True:
shutil.move(os.path.abspath('F1.txt'), 'C:\\example\\path')
shutil.move(os.path.abspath('F2.txt'), 'C:\\example\\path')
更新:更好的答案,修改自@ zalew的回答:https://stackoverflow.com/a/748879/5247482
import shutil
import os
import hashlib
def remove_duplicates(dir):
unique = []
for filename in os.listdir(dir):
if os.path.isfile(dir+'\\'+filename):
print('--Checking ' + dir+'\\'+filename)
filehash = hashlib.md5(filename.encode('utf8')).hexdigest()
print(filename, ' has hash: ', filehash)
if filehash not in unique:
unique.append(filehash)
else:
shutil.move(os.path.abspath(filename), 'C:\\example\\path\\destinationfolder')
return
remove_duplicates('C:\\example\\path\\sourcefolder')