我有两个文件夹,dir1和dir2。我必须找到两个文件夹(或子文件夹)中具有相同名称但内容不同的文件。
类似于:so.1.0 / p / q / search.c so.1.1 / p / q / search.c不同
有什么想法吗?
我以这种方式得到我需要的文件:
import os, sys, fnmatch, filecmp
folder1 = sys.argv[1]
folder2 = sys.argv[2]
filelist1 = []
filelist2 = []
for root, dirs, files in os.walk(folder1):
for filename in fnmatch.filter(files, '*.c'):
filelist1.append(os.path.join(root, filename))
for root, dirs, files, in os.walk(folder1):
for filename in fnmatch.filter(files, '*.h'):
filelist1.append(os.path.join(root, filename))
for root, dirs, files in os.walk(folder2):
for filename in fnmatch.filter(files, '*.c'):
filelist2.append(os.path.join(root, filename))
for root, dirs, files, in os.walk(folder2):
for filename in fnmatch.filter(files, '*.h'):
filelist2.append(os.path.join(root, filename))
现在我想比较两个文件列表,获取具有相同文件名的条目,并检查它们是否与内容不同。你觉得怎么样?
答案 0 :(得分:2)
使用os.walk()
生成任一目录中的文件列表(包含相对于其根的路径):
import os
def relative_files(path):
"""Generate filenames with pathnames relative to the initial path."""
for root, dirnames, files in os.walk(path):
relroot = os.path.relpath(root, path)
for filename in files:
yield os.path.join(relroot, filename)
从一个路径创建一组路径:
root_one = 'so.1.0' # use an absolute path here
root_two = 'so.1.1' # use an absolute path here
files_one = set(relative_files(root_one))
然后通过使用集合交集来查找另一个根中相同的所有路径名:
from itertools import izip_longest
def different_files(root_one, root_two):
"""Yield files that differ between the two roots
Generate pathnames relative to root_one and root_two that are present in both
but have different contents.
"""
files_one = set(relative_files(root_one))
for same in files_one.intersection(relative_files(root_two)):
# same is a relative path, so same file in different roots
with open(os.path.join(root_one, same)) as f1, open(os.path.join(root_two, same)) as f2:
if any(line1 != line2 for line1, line2 in izip_longest(f1, f2)):
# lines don't match, so files don't match!
yield same
itertools.izip_longest()
循环文件,有效地配对行;如果一个文件比另一个文件长,则剩余的行将与None
配对,以确保您检测到一个文件与另一个文件不同。
演示:
$ mkdir -p /tmp/so.1.0/p/q
$ mkdir -p /tmp/so.1.1/p/q
$ echo 'file one' > /tmp/so.1.0/p/q/search.c
$ echo 'file two' > /tmp/so.1.1/p/q/search.c
$ echo 'file three' > /tmp/so.1.1/p/q/ignored.c
$ echo 'matching' > /tmp/so.1.0/p/q/same.c
$ echo 'matching' > /tmp/so.1.1/p/q/same.c
>>> for different in different_files('/tmp/so.1.0', '/tmp/so.1.1'):
... print different
...
p/q/search.c
答案 1 :(得分:1)