是否有任何散列函数具有以下属性

时间:2014-11-12 06:51:40

标签: java python c++ hash persistence

我想要一个快速,抗冲突并且可以提供唯一输出的哈希函数。主要要求是 - 它应该是持久的,即它的进度(散列进度)可以保存在文件中,然后再恢复。您也可以使用Python提供自己的实现。

"其他语言的实施"如果可以在Python中使用它而不会弄脏内部,那么也是可以接受的。

提前致谢:)

1 个答案:

答案 0 :(得分:2)

由于pigeonhole principle无哈希函数可以生成唯一/防碰撞的哈希值。良好的散列函数具有抗冲突性,因此很难生成生成指定散列的文件。设计一个好的哈希函数是一个高级主题,我当然不是该领域的专家。但是,由于我的代码基于sha256,它应该是相当防冲突的,并且希望生成一个产生指定哈希的文件也很困难,但我不能保证这方面。


这是一个基于sha256的可恢复散列函数,速度相当快。使用2GB RAM在我的2GHz机器上散布1.4GB文件大约需要44秒。

<强> persistent_hash.py

#! /usr/bin/env python

''' Use SHA-256 to make a resumable hash function

    The file is divided into fixed-sized chunks, which are hashed separately.
    The hash of each chunk is combined into a hash for the whole file.

    The hashing process may be interrupted by Control-C (SIGINT) or SIGTERM.
    When a signal is received, hashing continues until the end of the 
    current chunk, then the file position and current hex digest is saved
    to a file. The name of this file is formed by appending '.hash' to the 
    name of the file being hashed.

    Just re-run the program to resume hashing. The '.hash' file will be deleted 
    once hashing is completed.

    Written by PM 2Ring 2014.11.11
'''

import sys
import os
import hashlib
import signal

quit = False

blocksize = 1<<16   # 64kB
blocksperchunk = 1<<10

chunksize = blocksize * blocksperchunk

def handler(signum, frame):
    global quit
    print "\nGot signal %d, cleaning up." % signum
    quit = True


def do_hash(fname):
    hashname = fname + '.hash'
    if os.path.exists(hashname):
        with open(hashname, 'rt') as f:
            data = f.read().split()
        pos = int(data[0])
        current = data[1].decode('hex')
    else:
        pos = 0
        current = ''

    finished = False
    with open(fname, 'rb') as f:
        f.seek(pos)
        while not (quit or finished):
            full = hashlib.sha256(current)
            part = hashlib.sha256()
            for _ in xrange(blocksperchunk):
                block = f.read(blocksize)
                if block == '':
                    finished = True
                    break
                part.update(block)

            full.update(part.digest())
            current = full.digest()
            pos += chunksize
            print pos
            if finished or quit:
                break

    hexdigest = full.hexdigest()
    if quit:
        with open(hashname, 'wt') as f:
            f.write("%d %s\n" % (pos, hexdigest))
    elif os.path.exists(hashname):
        os.remove(hashname)    

    return (not quit), pos, hexdigest


def main():
    if len(sys.argv) != 2:
        print "Calculate resumable hash of a file."
        print "Usage:\npython %s filename\n" % sys.argv[0]
        exit(1)

    fname = sys.argv[1]

    signal.signal(signal.SIGINT, handler)
    signal.signal(signal.SIGTERM, handler)

    print do_hash(fname)


if __name__ == '__main__':
    main()