文件是读取还是记住?

时间:2014-06-26 04:22:03

标签: python windows memory file-io file-comparison

如果我正在复制文件然后将其进行比较:

import shutil, filecmp

# dummy file names, they're not important
InFile = "d:\\Some\\Path\\File.ext"
CopyFile = "d:\\Some\\other\\Path\\File_Copy.ext"

# copy the file
shutil.copyfile(InFile,CopyFile)

# compare the two files
if not filecmp.cmp(InFile,CopyFile,shallow=False):
    print "File not copied correctly"

为什么呢?这似乎有点无意义不是吗?毕竟我刚刚复制了文件 是相同的,不是吗? 错误!硬盘驱动器的错误率非常小,但仍然存在。唯一可以确定的方法是重新读取文件但是因为它只是在内存中我如何确定系统(Windows 7)实际从媒体中读取文件而不仅仅是从standby memory返回页面?

让我们假设我必须将16 TB的数据写入可移动硬盘驱动器,并且我必须确保光盘上的所有文件都没有损坏 - 或者至少没有比实时文件更糟糕的文件。在16 TB的磁盘空间中,可能会有一些不相同的文件;我目前正在使用WinDiff逐字节检查文件,但文件比较实用程序很慢,但至少我可以合理地确定它实际上正在读取从光盘复制的文件,因为页面应该早就不见了。

任何人都可以基于确定性提供可能发生的专家意见:阅读还是记住?

可疑的是,如果我复制的内存少于安装的内存,验证过程比副本更快 - 应该是,阅读比写入更快,但 快。如果我复制3GB的文件(我有32 GB的安装内存)并且需要一分钟,那么验证应该需要50秒左右,并且应该在资源监视器上使用100%光盘..但事实并非如此,验证时间不到10秒资源监控不会让​​步。如果我复制的内存超过已安装的内存,则验证时间差不多,资源监视器显示100% - 我期望的内容!那么这里发生了什么?

作为参考,删除了错误检查的真实代码:

import shutil, filecmp, os, sys

FromFolder = sys.argv[1]
ToFolder   = sys.argv[2]

VerifyList = list()
VerifyToList = list()

BytesCopied = 0

if not os.path.exists(ToFolder):
    os.mkdir(ToFolder)

for (path, dirs, files) in os.walk(FromFolder):
    RelPath = path[len(FromFolder):len(path)]
    OutPath = ToFolder + RelPath

    if not os.path.exists(OutPath):
        os.mkdir(OutPath)

    for thisFile in files:
        InFile = path + "\\" + thisFile
        CopyFile = OutPath + "\\" + thisFile

        ByteSize = os.path.getsize(InFile)
        if ByteSize < 1024:
            RepSize = "%d bytes" % ByteSize
        elif ByteSize < 1048576:
            RepSize = "%.1f KB" %  (ByteSize / 1024) 
        elif ByteSize < 1073741824:
            RepSize = "%.1f MB" %  (ByteSize / 1048576)
        else:
            RepSize = "%.1f GB" %  (ByteSize / 1073741824)

        print "copy %s > %s " % (RepSize, thisFile)

        VerifyList.append(InFile)
        VerifyToList.append(CopyFile)

        shutil.copyfile(InFile,CopyFile)

# finished copying, now verify
FileIndex = range(len(VerifyList))
reVerifyList = list()
reVerifyToList = list()

for thisIndex in FileIndex:
    InFile = VerifyList[thisIndex]
    CopyFile = VerifyToList[thisIndex]

    thisFile = os.path.basename(InFile)
    ByteSize = os.path.getsize(InFile)

    if ByteSize < 1024:
        RepSize = "%d bytes" % ByteSize
    elif ByteSize < 1048576:
        RepSize = "%.1f KB" %  (ByteSize / 1024) 
    elif ByteSize < 1073741824:
        RepSize = "%.1f MB" %  (ByteSize / 1048576)
    else:
        RepSize = "%.1f GB" %  (ByteSize / 1073741824)

    print "Verify %s > %s" % (RepSize, thisFile)

    if not filecmp.cmp(InFile,CopyFile,shallow=False):
        #thisFile = os.path.basename(InFile)
        print "File not copied correctly " + thisFile
        # copy, second chance
        reVerifyList.append(InFile)
        reVerifyToList.append(CopyFile)
        shutil.copyfile(InFile,CopyFile)

del VerifyList
del VerifyToList

if len(reVerifyList) > 0:
    FileIndex = range(len(reVerifyList))
    for thisIndex in FileIndex:
        InFile = reVerifyList[thisIndex]
        CopyFile = reVerifyToList[thisIndex]

        if not filecmp.cmp(InFile,CopyFile,shallow=False):
            thisFile = os.path.basename(InFile)
            print "File failed 2nd chance " + thisFile

1 个答案:

答案 0 :(得分:1)

如果使用外部硬盘驱动器,则可以关闭此驱动器的写入缓存。

但是你永远无法100%确定,因为一些现代硬盘驱动器具有透明缓冲的内部缓冲区(SSD) - 现在你的操作系统甚至可以识别它...