将两个目录与​​子目录进行比较以查找任何更改?

时间:2013-10-08 15:29:02

标签: python compare directory

对于初学者我现在只玩python大约两个星期而且我对它的proccessess相对较新,我正在尝试创建一个脚本,将两个目录与​​子目录进行比较并打印出任何更改。我已经阅读过关于使用os.walk来浏览目录的文章,我已经设法编写了一个脚本,以一种可理解的方式打印目录及其子目录中的所有文件。我也在这里阅读并学习了如何比较两个目录,但它只比较了1个文件。

import os
x = 'D:\\xfiles'
y = 'D:\\yfiles'
q= [ filename for filename in x if filename not in y ]
print q 

显然,这并不是我想要的。但是,这列出了所有文件和所有目录。

import os
x = 'D:\\xfiles'
x1 = os.walk(x)
for dirName, subdirList, fileList in x1:
     print ('Directory: %s' % dirName)
     for fname in fileList:
     print ('\%s' % fname)

如何将它们组合起来并使其发挥作用?

4 个答案:

答案 0 :(得分:2)

我想最好的方法是外部程序,正如@Robᵩ在评论中所建议的那样。

使用Python我建议做以下事项:

import os

def fileIsSame(right, left, path):
    return os.path.exists (os.path.join(left, path.replace(right, '')));

def compare(right, left):
    difference = list();
    for root, dirs, files in os.walk(right):
        for name in files:
            path = os.path.join(root, name);
            # check if file is same
            if fileIsSame(right, left, path):
                if os.path.isdir(path):
                    # recursively check subdirs
                    difference.extend(compare(path, left));
            else:
                # count file as difference
                difference.append(path);

    return difference;

这种方法缺乏正常的fileIsSame函数,可以确保文件按内容或修改日期相同,并确保正确处理路径(因为我不确定此变体会如何)。此算法要求您指定完整路径。

用法示例:

print (compare(r'c:\test', r'd:\copy_of_test'));

如果第二个文件夹是第一个文件夹的副本,则忽略路径中的所有差异(不同的磁盘字母和foldername)。输出将为[]

答案 1 :(得分:0)

编写一个功能来汇总您的商家信息。

import os

def listfiles(path):
    files = []
    for dirName, subdirList, fileList in os.walk(path):
        dir = dirName.replace(path, '')
        for fname in fileList:
            files.append(os.path.join(dir, fname))
    return files

x = listfiles('D:\\xfiles')
y = listfiles('D:\\yfiles')

您可以使用列表推导来提取不在两个目录中的文件。

q = [filename for filename in x if filename not in y]

但使用sets效率更高,更灵活。

files_only_in_x = set(x) - set(y) 
files_only_in_y = set(y) - set(x)
files_only_in_either = set(x) ^ set(y)
files_in_both = set(x) & set(y)
all_files = set(x) | set(y)

答案 2 :(得分:0)

import os

def ls(path):
    all = []
    walked = os.walk(path)
    for base, sub_f, files in walked:           
        for sub in sub_f:           
            entry = os.path.join(base,sub)
            entry = entry[len(path):].strip("\\")
            all.append(entry)

        for file in files:          
            entry = os.path.join(base,file)
            entry = entry[len(path):].strip("\\")
            all.append(entry)
    all.sort()
    return all

def folder_diff(folder1_path, folder2_path):
    folder1_list = ls(folder1_path);
    folder2_list = ls(folder2_path);
    diff = [item for item in folder1_list if item not in folder2_list]
    diff.extend([item for item in folder2_list if item not in folder1_list])
    return diff

答案 3 :(得分:0)

我做了一个递归检查两个目录的代码,如果有不同,它会指出不同的行。

import os


FOLDER_A = os.path.join(os.path.dirname(__file__), 'folder_a')
FOLDER_B = os.path.join(os.path.dirname(__file__), 'folder_b')


def load_directory(directory):

    files = set()
    directories = set()
    
    for file_or_directory in os.listdir(directory):
        file_or_directory_path = f'{directory}/{file_or_directory}'

        if os.path.isfile(file_or_directory_path):
            files.add(file_or_directory)
        else:
            directories.add(file_or_directory)
    
    return files, directories



def compare_files(a, b):
    assert os.path.isfile(a)
    assert os.path.isfile(b)

    with open(a, 'r') as file:
        file_a = file.read()
    
    with open(b, 'r') as file:
        file_b = file.read()


    if file_a != file_b:
        file_a_lines = file_a.split('\n')
        file_b_lines = file_b.split('\n')

        if len(file_a_lines) != len(file_b_lines):
            print(f'Two file {a} and {b} have different length, of {len(file_a_lines)} and {len(file_b_lines)}')
            return False

        compare_lines = zip(file_a_lines, file_b_lines)

        index = 0
        for i in compare_lines:
            index += 1
            if i[0] != i[1]:
                print(f'Different found in file {a} and {b}, at line number {index}')
                return False
        
        print('Some thing wrong')
        return False


    return True


def compare_directories(a, b):

    assert not os.path.isfile(a)
    assert not os.path.isfile(b)
    
    a_files, a_directories = load_directory(a)
    b_files, b_directories = load_directory(b)

    if (a_files != b_files):
        print(f'Different Found In {a} and {b} directories files')
        print(f'A: {a_files}\nB: {b_files}')
        return False
    
    if (a_directories != b_directories):
        print(f'Different Found In {a} and {b} directories subdirectories')
        print(f'A: {a_directories}\nB: {b_directories}')
        return False
    
    for files in a_files:
        if not compare_files(f'{a}/{files}', f'{b}/{files}'):
            return False
    
    for directories in a_directories:
        if not compare_directories(f'{a}/{directories}', f'{b}/{directories}'):
            return False
    
    return True


def main():
    print(compare_directories(FOLDER_A, FOLDER_B))


if __name__ == '__main__':
    main()