如何比较目录以确定哪些文件已更改?

时间:2011-02-16 20:39:12

标签: python file hash diff rsync

我们需要一个脚本来比较两个文件目录,并且对于在目录1和目录2之间已经更改的每个文件(添加,删除,修改),需要只创建那些修改过的文件的子集。

我的第一印象是创建一个python脚本来遍历每个目录,计算每个文件的哈希值,如果哈希值已经更改,则将文件复制到新的文件子集。这是一种正确的方法吗?我是否忽略了那些已经可以做到这一点的工具?我从来没有使用它,但也许可以使用rsync这样的东西?

由于

修改

重要的是,我能够编译只有那些文件被更改的子集 - 所以如果版本之间只有3个文件发生了变化,我只需要将这三个文件复制到新目录......

4 个答案:

答案 0 :(得分:3)

在我看来,你需要一些简单的东西:

from os.path import getmtime
from os import sep,listdir

rep1 = 'I:\\dada'
rep2 = 'I:\\didi'

R1 = listdir(rep1)
R2 = listdir(rep2)


vanished = [ filename for filename in R1 if filename not in R2]
appeared = [ filename for filename in R2 if filename not in R1]
modified = [ filename for filename in ( f for f in R2 if f in R1)
             if getmtime(rep1+sep+filename)!=getmtime(rep2+sep+filename)]


print 'vanished==',vanished
print 'appeared==',appeared
print 'modified==',modified

答案 1 :(得分:2)

这是一种完全合理的方法,但您实际上是在重新发明rsync。所以是的,请使用rsync。

修改There's a way to create "difference-only" folders using rsync

答案 2 :(得分:0)

我喜欢diffmerge,它很适合这个目的。

答案 3 :(得分:0)

我已经修改了@eyquem一些答案!

参数可以

给出
  

python file.py dir1 dir2

注意:根据修改时间进行排序!

#!/usr/bin/python
import os, sys,time
from os.path import getmtime
from os import sep,listdir

ORIG_DIR = sys.argv[1]#orig:-->/root/backup.FPSS/bin/httpd
MODIFIED_DIR = sys.argv[2]#modified-->/FPSS/httpd/bin/httpd

LIST_OF_FILES_IN_ORIG_DIR = listdir(ORIG_DIR)
LIST_OF_FILES_IN_MODIFIED_DIR = listdir(MODIFIED_DIR)


vanished = [ filename for filename in LIST_OF_FILES_IN_ORIG_DIR if filename not in LIST_OF_FILES_IN_MODIFIED_DIR]
appeared = [ filename for filename in LIST_OF_FILES_IN_MODIFIED_DIR if filename not in LIST_OF_FILES_IN_ORIG_DIR]
modified = [ filename for filename in ( f for f in LIST_OF_FILES_IN_MODIFIED_DIR if f in LIST_OF_FILES_IN_ORIG_DIR) if getmtime(ORIG_DIR+sep+filename)<getmtime(MODIFIED_DIR+sep+filename)]
same = [ filename for filename in ( f for f in LIST_OF_FILES_IN_MODIFIED_DIR if f in LIST_OF_FILES_IN_ORIG_DIR) if getmtime(ORIG_DIR+sep+filename)>=getmtime(MODIFIED_DIR+sep+filename)]

def print_list(arg):
    for f in arg:
        print '----->',f
    print 'Total :: ',len(arg)

print '###################################################################################################'
print 'Files which have Vanished from MOD: ',MODIFIED_DIR,' but still present ',ORIG_DIR,' ==>\n',print_list(vanished)
print '-----------------------------------------------------------------------------------------------------'
print 'Files which are Appearing in MOD: ',MODIFIED_DIR,' but not present ',ORIG_DIR,' ==>\n',print_list(appeared)
print '-----------------------------------------------------------------------------------------------------'
print 'Files in MOD: ',MODIFIED_DIR,' which are MODIFIED if compared to ORIG: ',ORIG_DIR,' ==>\n',print_list(modified)
print '-----------------------------------------------------------------------------------------------------'
print 'Files in MOD: ',MODIFIED_DIR,' which are NOT modified if compared to ORIG: ',ORIG_DIR,' ==>\n',print_list(same)
print '###################################################################################################'