对于初学者我现在只玩python大约两个星期而且我对它的proccessess相对较新,我正在尝试创建一个脚本,将两个目录与子目录进行比较并打印出任何更改。我已经阅读过关于使用os.walk来浏览目录的文章,我已经设法编写了一个脚本,以一种可理解的方式打印目录及其子目录中的所有文件。我也在这里阅读并学习了如何比较两个目录,但它只比较了1个文件。
import os
x = 'D:\\xfiles'
y = 'D:\\yfiles'
q= [ filename for filename in x if filename not in y ]
print q
显然,这并不是我想要的。但是,这列出了所有文件和所有目录。
import os
x = 'D:\\xfiles'
x1 = os.walk(x)
for dirName, subdirList, fileList in x1:
print ('Directory: %s' % dirName)
for fname in fileList:
print ('\%s' % fname)
如何将它们组合起来并使其发挥作用?
答案 0 :(得分:2)
我想最好的方法是外部程序,正如@Robᵩ在评论中所建议的那样。
使用Python我建议做以下事项:
import os
def fileIsSame(right, left, path):
return os.path.exists (os.path.join(left, path.replace(right, '')));
def compare(right, left):
difference = list();
for root, dirs, files in os.walk(right):
for name in files:
path = os.path.join(root, name);
# check if file is same
if fileIsSame(right, left, path):
if os.path.isdir(path):
# recursively check subdirs
difference.extend(compare(path, left));
else:
# count file as difference
difference.append(path);
return difference;
这种方法缺乏正常的fileIsSame
函数,可以确保文件按内容或修改日期相同,并确保正确处理路径(因为我不确定此变体会如何)。此算法要求您指定完整路径。
用法示例:
print (compare(r'c:\test', r'd:\copy_of_test'));
如果第二个文件夹是第一个文件夹的副本,则忽略路径中的所有差异(不同的磁盘字母和foldername)。输出将为[]
。
答案 1 :(得分:0)
编写一个功能来汇总您的商家信息。
import os
def listfiles(path):
files = []
for dirName, subdirList, fileList in os.walk(path):
dir = dirName.replace(path, '')
for fname in fileList:
files.append(os.path.join(dir, fname))
return files
x = listfiles('D:\\xfiles')
y = listfiles('D:\\yfiles')
您可以使用列表推导来提取不在两个目录中的文件。
q = [filename for filename in x if filename not in y]
但使用sets效率更高,更灵活。
files_only_in_x = set(x) - set(y)
files_only_in_y = set(y) - set(x)
files_only_in_either = set(x) ^ set(y)
files_in_both = set(x) & set(y)
all_files = set(x) | set(y)
答案 2 :(得分:0)
import os
def ls(path):
all = []
walked = os.walk(path)
for base, sub_f, files in walked:
for sub in sub_f:
entry = os.path.join(base,sub)
entry = entry[len(path):].strip("\\")
all.append(entry)
for file in files:
entry = os.path.join(base,file)
entry = entry[len(path):].strip("\\")
all.append(entry)
all.sort()
return all
def folder_diff(folder1_path, folder2_path):
folder1_list = ls(folder1_path);
folder2_list = ls(folder2_path);
diff = [item for item in folder1_list if item not in folder2_list]
diff.extend([item for item in folder2_list if item not in folder1_list])
return diff
答案 3 :(得分:0)
我做了一个递归检查两个目录的代码,如果有不同,它会指出不同的行。
import os
FOLDER_A = os.path.join(os.path.dirname(__file__), 'folder_a')
FOLDER_B = os.path.join(os.path.dirname(__file__), 'folder_b')
def load_directory(directory):
files = set()
directories = set()
for file_or_directory in os.listdir(directory):
file_or_directory_path = f'{directory}/{file_or_directory}'
if os.path.isfile(file_or_directory_path):
files.add(file_or_directory)
else:
directories.add(file_or_directory)
return files, directories
def compare_files(a, b):
assert os.path.isfile(a)
assert os.path.isfile(b)
with open(a, 'r') as file:
file_a = file.read()
with open(b, 'r') as file:
file_b = file.read()
if file_a != file_b:
file_a_lines = file_a.split('\n')
file_b_lines = file_b.split('\n')
if len(file_a_lines) != len(file_b_lines):
print(f'Two file {a} and {b} have different length, of {len(file_a_lines)} and {len(file_b_lines)}')
return False
compare_lines = zip(file_a_lines, file_b_lines)
index = 0
for i in compare_lines:
index += 1
if i[0] != i[1]:
print(f'Different found in file {a} and {b}, at line number {index}')
return False
print('Some thing wrong')
return False
return True
def compare_directories(a, b):
assert not os.path.isfile(a)
assert not os.path.isfile(b)
a_files, a_directories = load_directory(a)
b_files, b_directories = load_directory(b)
if (a_files != b_files):
print(f'Different Found In {a} and {b} directories files')
print(f'A: {a_files}\nB: {b_files}')
return False
if (a_directories != b_directories):
print(f'Different Found In {a} and {b} directories subdirectories')
print(f'A: {a_directories}\nB: {b_directories}')
return False
for files in a_files:
if not compare_files(f'{a}/{files}', f'{b}/{files}'):
return False
for directories in a_directories:
if not compare_directories(f'{a}/{directories}', f'{b}/{directories}'):
return False
return True
def main():
print(compare_directories(FOLDER_A, FOLDER_B))
if __name__ == '__main__':
main()