我如何从Python中派生出hashlib.sha256?

时间:2010-10-27 02:50:16

标签: python security cryptography

天真的尝试悲惨地失败了:

import hashlib

class fred(hashlib.sha256):
    pass

-> TypeError: Error when calling the metaclass bases
       cannot create 'builtin_function_or_method' instances

嗯,事实证明hashlib.sha256是一个可调用的,而不是一个类。尝试更有创意的东西也不起作用:

 import hashlib

 class fred(type(hashlib.sha256())):
     pass

 f = fred

 -> TypeError: cannot create 'fred' instances

嗯...

那么,我该怎么办?

这是我想要实现的目标:

class shad_256(sha256):
    """Double SHA - sha256(sha256(data).digest())
Less susceptible to length extension attacks than sha256 alone."""
    def digest(self):
        return sha256(sha256.digest(self)).digest()
    def hexdigest(self):
        return sha256(sha256.digest(self)).hexdigest()

基本上我希望一切都能通过,除非有人要求结果我想插入我自己的额外步骤。有没有一种聪明的方法可以用__new__或某种类型的元类魔法来实现这个目标?

我有一个解决方案我很满意我发布的答案,但我真的很想知道是否有人能想到更好的东西。要么更简洁,可读性成本非常低,要么更快(特别是在调用update时),同时仍具有一定的可读性。

更新:我进行了一些测试:

# test_sha._timehash takes three parameters, the hash object generator to use,
# the number of updates and the size of the updates.

# Built in hashlib.sha256
$ python2.7 -m timeit -n 100 -s 'import test_sha, hashlib' 'test_sha._timehash(hashlib.sha256, 20000, 512)'
100 loops, best of 3: 104 msec per loop

# My wrapper based approach (see my answer)
$ python2.7 -m timeit -n 100 -s 'import test_sha, hashlib' 'test_sha._timehash(test_sha.wrapper_shad_256, 20000, 512)'
100 loops, best of 3: 108 msec per loop

# Glen Maynard's getattr based approach
$ python2.7 -m timeit -n 100 -s 'import test_sha, hashlib' 'test_sha._timehash(test_sha.getattr_shad_256, 20000, 512)'
100 loops, best of 3: 103 msec per loop

4 个答案:

答案 0 :(得分:7)

创建一个新类,派生自object,在 init 中创建hashlib.sha256成员var,然后定义散列类所需的方法,并代理成员变量的相同方法。

类似的东西:

import hashlib

class MyThing(object):
    def __init__(self):
        self._hasher = hashlib.sha256()

    def digest(self):
        return self._hasher.digest()

等等其他方法。

答案 1 :(得分:5)

只需使用__getattr__使您未自行定义的所有属性都回退到基础对象上:

import hashlib

class shad_256(object):
    """
    Double SHA - sha256(sha256(data).digest())
    Less susceptible to length extension attacks than sha256 alone.

    >>> s = shad_256('hello world')
    >>> s.digest_size
    32
    >>> s.block_size
    64
    >>> s.sha256.hexdigest()
    'b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9'
    >>> s.hexdigest()
    'bc62d4b80d9e36da29c16c5d4d9f11731f36052c72401a76c23c0fb5a9b74423'
    >>> s.nonexistant()
    Traceback (most recent call last):
    ...
    AttributeError: '_hashlib.HASH' object has no attribute 'nonexistant'
    >>> s2 = s.copy()
    >>> s2.digest() == s.digest()
    True
    >>> s2.update("text")
    >>> s2.digest() == s.digest()
    False
    """
    def __init__(self, data=None):
        self.sha256 = hashlib.sha256()
        if data is not None:
            self.update(data)

    def __getattr__(self, key):
        return getattr(self.sha256, key)

    def _get_final_sha256(self):
        return hashlib.sha256(self.sha256.digest())

    def digest(self):
        return self._get_final_sha256().digest()

    def hexdigest(self):
        return self._get_final_sha256().hexdigest()

    def copy(self):
        result = shad_256()
        result.sha256 = self.sha256.copy()
        return result

if __name__ == "__main__":
    import doctest
    doctest.testmod()

这主要消除了update次呼叫的开销,但并非完全消除。{p}如果您想完全消除它,请将其添加到__init__(相应地在copy中):

self.update = self.sha256.update

在查找__getattr__时,这将消除额外的update来电。

这一切都利用了Python成员函数中一个更有用且经常被忽略的属性:函数绑定。回想一下,你可以这样做:

a = "hello"
b = a.upper
b()

因为对成员函数的引用不会返回原始函数,而是返回该函数与其对象的绑定。这就是为什么当上面的__getattr__返回self.sha256.update时,返回的函数会在self.sha256上正确运行,而不是self

答案 2 :(得分:2)

所以,这是我想出的答案,这是基于格伦的回答,这是我给予他奖励的答案:

import hashlib

class _double_wrapper(object):
    """This wrapper exists because the various hashes from hashlib are
    factory functions and there is no type that can be derived from.
    So this class simulates deriving from one of these factory
    functions as if it were a class and then implements the 'd'
    version of the hash function which avoids length extension attacks
    by applying H(H(text)) instead of just H(text)."""

    __slots__ = ('_wrappedinstance', '_wrappedfactory', 'update')
    def __init__(self, wrappedfactory, *args):
        self._wrappedfactory = wrappedfactory
        self._assign_instance(wrappedfactory(*args))

    def _assign_instance(self, instance):
        "Assign new wrapped instance and set update method."
        self._wrappedinstance = instance
        self.update = instance.update

    def digest(self):
        "return the current digest value"
        return self._wrappedfactory(self._wrappedinstance.digest()).digest()

    def hexdigest(self):
        "return the current digest as a string of hexadecimal digits"
        return self._wrappedfactory(self._wrappedinstance.digest()).hexdigest()

    def copy(self):
        "return a copy of the current hash object"
        new = self.__class__()
        new._assign_instance(self._wrappedinstance.copy())
        return new

    digest_size = property(lambda self: self._wrappedinstance.digest_size,
                           doc="number of bytes in this hashes output")
    digestsize = digest_size
    block_size = property(lambda self: self._wrappedinstance.block_size,
                          doc="internal block size of hash function")

class shad_256(_double_wrapper):
    """
    Double SHA - sha256(sha256(data))
    Less susceptible to length extension attacks than SHA2_256 alone.

    >>> import binascii
    >>> s = shad_256('hello world')
    >>> s.name
    'shad256'
    >>> int(s.digest_size)
    32
    >>> int(s.block_size)
    64
    >>> s.hexdigest()
    'bc62d4b80d9e36da29c16c5d4d9f11731f36052c72401a76c23c0fb5a9b74423'
    >>> binascii.hexlify(s.digest()) == s.hexdigest()
    True
    >>> s2 = s.copy()
    >>> s2.digest() == s.digest()
    True
    >>> s2.update("text")
    >>> s2.digest() == s.digest()
    False
    """
    __slots__ = ()
    def __init__(self, *args):
        super(shad_256, self).__init__(hashlib.sha256, *args)
    name = property(lambda self: 'shad256', doc='algorithm name')

这有点冗长,但从文档的角度来看,这个类可以很好地工作,并且具有相对清晰的实现。通过Glen的优化,update尽可能快。

有一个烦恼,就是update函数显示为数据成员而没有文档字符串。我认为这是可接受的可读性/效率权衡。

答案 3 :(得分:0)

from hashlib import sha256

class shad_256(object):
    def __init__(self, data=''):
        self._hash = sha256(data)

    def __getattr__(self, attr):
        setattr(self, attr, getattr(self._hash, attr))
        return getattr(self, attr)

    def copy(self):
        ret = shad_256()
        ret._hash = self._hash.copy()
        return ret

    def digest(self):
        return sha256(self._hash.digest()).digest()

    def hexdigest(self):
        return sha256(self._hash.digest()).hexdigest()

实例上找不到的任何属性都由__getattr__懒惰地绑定。 copy()当然需要特别对待。