从我观察到的filecmp.dircmp
递归,但不满足我的需求,至少在py2中。我想比较两个目录及其所有包含的文件。这是否存在,或者我是否需要构建(例如,使用os.walk
)。我更喜欢预先构建,其他人已经完成了单元测试:)
实际的“比较”可能很草率(例如,忽略权限),如果有帮助的话。
我想要布尔值,report_full_closure
是打印报告。它也只是常见的子目录。 AFIAC,如果左边或右边的任何东西只有那些是不同的目录。我使用os.walk
来构建它。
答案 0 :(得分:21)
这是使用filecmp
模块的比较函数的替代实现。它使用递归而不是os.walk
,因此它更简单一些。但是,它并不仅仅通过使用common_dirs
和subdirs
属性来递归,因为在这种情况下,我们将隐式使用文件比较的默认“浅层”实现,这可能不是您想要的。在下面的实现中,当比较具有相同名称的文件时,我们总是只比较它们的内容。
import filecmp
import os.path
def are_dir_trees_equal(dir1, dir2):
"""
Compare two directories recursively. Files in each directory are
assumed to be equal if their names and contents are equal.
@param dir1: First directory path
@param dir2: Second directory path
@return: True if the directory trees are the same and
there were no errors while accessing the directories or files,
False otherwise.
"""
dirs_cmp = filecmp.dircmp(dir1, dir2)
if len(dirs_cmp.left_only)>0 or len(dirs_cmp.right_only)>0 or \
len(dirs_cmp.funny_files)>0:
return False
(_, mismatch, errors) = filecmp.cmpfiles(
dir1, dir2, dirs_cmp.common_files, shallow=False)
if len(mismatch)>0 or len(errors)>0:
return False
for common_dir in dirs_cmp.common_dirs:
new_dir1 = os.path.join(dir1, common_dir)
new_dir2 = os.path.join(dir2, common_dir)
if not are_dir_trees_equal(new_dir1, new_dir2):
return False
return True
答案 1 :(得分:14)
filecmp.dircmp
是要走的路。但它没有比较两个比较目录中使用相同路径找到的文件的内容。相反,filecmp.dircmp
仅查看文件属性。由于dircmp
是一个类,因此您使用dircmp
子类修复它,并覆盖其phase3
函数,该函数会比较文件以确保比较内容,而不是仅比较os.stat
属性。< / p>
import filecmp
class dircmp(filecmp.dircmp):
"""
Compare the content of dir1 and dir2. In contrast with filecmp.dircmp, this
subclass compares the content of files with the same path.
"""
def phase3(self):
"""
Find out differences between common files.
Ensure we are using content comparison with shallow=False.
"""
fcomp = filecmp.cmpfiles(self.left, self.right, self.common_files,
shallow=False)
self.same_files, self.diff_files, self.funny_files = fcomp
然后你可以用它来返回一个布尔值:
import os.path
def is_same(dir1, dir2):
"""
Compare two directory trees content.
Return False if they differ, True is they are the same.
"""
compared = dircmp(dir1, dir2)
if (compared.left_only or compared.right_only or compared.diff_files
or compared.funny_files):
return False
for subdir in compared.common_dirs:
if not is_same(os.path.join(dir1, subdir), os.path.join(dir2, subdir)):
return False
return True
如果您想重复使用此代码段,则特此专用于您选择的Public Domain或Creative Commons CC0(除了SO提供的默认许可CC-BY-SA)。
答案 2 :(得分:5)
report_full_closure()
方法是递归的:
comparison = filecmp.dircmp('/directory1', '/directory2')
comparison.report_full_closure()
编辑:在OP编辑之后,我会说最好只使用filecmp
中的其他功能。我认为os.walk
是不必要的;最好简单地通过common_dirs
等产生的列表进行递归,尽管在某些情况下(大型目录树),如果实施不当,这可能会导致Max Recursion Depth错误。
答案 3 :(得分:3)
这是一个带递归函数的简单解决方案:
import filecmp
def same_folders(dcmp):
if dcmp.diff_files:
return False
for sub_dcmp in dcmp.subdirs.values():
return same_folders(sub_dcmp)
return True
same_folders(filecmp.dircmp('/tmp/archive1', '/tmp/archive2'))
答案 4 :(得分:2)
dircmp
可以是递归的:请参阅report_full_closure
。
据我所知dircmp
没有提供目录比较功能。不过,编写自己的内容会非常容易;在left_only
上使用right_only
和dircmp
来检查目录中的文件是否相同,然后在subdirs
属性上进行递归。
答案 5 :(得分:2)
比较布局dir1和dir2的另一种解决方案,忽略文件内容
请参阅此处的要点:https://gist.github.com/4164344
编辑:这是代码,以防因为某些原因导致gist丢失:
import os
def compare_dir_layout(dir1, dir2):
def _compare_dir_layout(dir1, dir2):
for (dirpath, dirnames, filenames) in os.walk(dir1):
for filename in filenames:
relative_path = dirpath.replace(dir1, "")
if os.path.exists( dir2 + relative_path + '\\' + filename) == False:
print relative_path, filename
return
print 'files in "' + dir1 + '" but not in "' + dir2 +'"'
_compare_dir_layout(dir1, dir2)
print 'files in "' + dir2 + '" but not in "' + dir1 +'"'
_compare_dir_layout(dir2, dir1)
compare_dir_layout('xxx', 'yyy')
答案 6 :(得分:0)
以下是我的解决方案:gist
def dirs_same_enough(dir1,dir2,report=False):
''' use os.walk and filecmp.cmpfiles to
determine if two dirs are 'same enough'.
Args:
dir1, dir2: two directory paths
report: if True, print the filecmp.dircmp(dir1,dir2).report_full_closure()
before returning
Returns:
bool
'''
# os walk: root, list(dirs), list(files)
# those lists won't have consistent ordering,
# os.walk also has no guaranteed ordering, so have to sort.
walk1 = sorted(list(os.walk(dir1)))
walk2 = sorted(list(os.walk(dir2)))
def report_and_exit(report,bool_):
if report:
filecmp.dircmp(dir1,dir2).report_full_closure()
return bool_
else:
return bool_
if len(walk1) != len(walk2):
return false_or_report(report)
for (p1,d1,fl1),(p2,d2,fl2) in zip(walk1,walk2):
d1,fl1, d2, fl2 = set(d1),set(fl1),set(d2),set(fl2)
if d1 != d2 or fl1 != fl2:
return report_and_exit(report,False)
for f in fl1:
same,diff,weird = filecmp.cmpfiles(p1,p2,fl1,shallow=False)
if diff or weird:
return report_and_exit(report,False)
return report_and_exit(report,True)
答案 7 :(得分:0)
def same(dir1, dir2):
"""Returns True if recursively identical, False otherwise
"""
c = filecmp.dircmp(dir1, dir2)
if c.left_only or c.right_only or c.diff_files or c.funny_files:
return False
else:
safe_so_far = True
for i in c.common_dirs:
same_so_far = same_so_far and same(os.path.join(frompath, i), os.path.join(topath, i))
if not same_so_far:
break
return same_so_far
答案 8 :(得分:0)
基于python issue 12932和filecmp documentation,您可以使用以下示例:
import os
import filecmp
# force content compare instead of os.stat attributes only comparison
filecmp.cmpfiles.__defaults__ = (False,)
def _is_same_helper(dircmp):
assert not dircmp.funny_files
if dircmp.left_only or dircmp.right_only or dircmp.diff_files or dircmp.funny_files:
return False
for sub_dircmp in dircmp.subdirs.values():
if not _is_same_helper(sub_dircmp):
return False
return True
def is_same(dir1, dir2):
"""
Recursively compare two directories
:param dir1: path to first directory
:param dir2: path to second directory
:return: True in case directories are the same, False otherwise
"""
if not os.path.isdir(dir1) or not os.path.isdir(dir2):
return False
dircmp = filecmp.dircmp(dir1, dir2)
return _is_same_helper(dircmp)
答案 9 :(得分:0)
这将检查文件是否位于相同位置,以及文件内容是否相同。无法正确验证空的子文件夹。
import filecmp
import glob
import os
path_1 = '.'
path_2 = '.'
def folders_equal(f1, f2):
file_pairs = list(zip(
[x for x in glob.iglob(os.path.join(f1, '**'), recursive=True) if os.path.isfile(x)],
[x for x in glob.iglob(os.path.join(f2, '**'), recursive=True) if os.path.isfile(x)]
))
locations_equal = any([os.path.relpath(x, f1) == os.path.relpath(y, f2) for x, y in file_pairs])
files_equal = all([filecmp.cmp(*x) for x in file_pairs])
return locations_equal and files_equal
folders_equal(path_1, path_2)
答案 10 :(得分:0)
由于只需要True或False结果,如果您安装了diff
:
def are_dir_trees_equal(dir1, dir2):
process = Popen(["diff", "-r", dir1, dir2], stdout=PIPE)
exit_code = process.wait()
return not exit_code
答案 11 :(得分:0)
这个递归函数似乎对我有用:
def has_differences(dcmp):
differences = dcmp.left_only + dcmp.right_only + dcmp.diff_files
if differences:
return True
return any([has_differences(subdcmp) for subdcmp in dcmp.subdirs.values()])
假设我没有忽略任何东西,如果你想知道目录是否相同,你可以否定结果:
from filecmp import dircmp
comparison = dircmp("dir1", "dir2")
same = not has_differences(comparison)
答案 12 :(得分:0)
致任何正在寻找简单图书馆的人:
https://github.com/mitar/python-deep-dircmp
DeepDirCmp 基本上是 filecmp.dircmp 的子类,并显示与 diff -qr dir1 dir2
相同的输出。
用法:
from deep_dircmp import DeepDirCmp
cmp = DeepDirCmp(dir1, dir2)
if len(cmp.get_diff_files_recursive()) == 0:
print("Dirs match")
else:
print("Dirs don't match")