Question

我有一个打开文件的脚本，查找包含HASH("<stuff>")的所有内容并将其替换为HASH(<sha1(stuff)>)

整个脚本是这样的：

import sys
import re
import hashlib

def _hash(seq, trim_bits=64):
    assert trim_bits % 8 == 0
    temp = hashlib.sha1(seq).hexdigest()
    temp = int(temp, 16) & eval('0x{}'.format('F' * (trim_bits/4)))
    temp = hex(temp)
    return str(temp[2:]).replace('L', '')

if __name__ == '__main__':
    assert len(sys.argv) == 3
    in_file = sys.argv[1]
    out_file = sys.argv[2]
    with open(in_file, 'r') as f:
        lines = f.readlines()
        out_handle = open(out_file, 'w')
        for line in lines:
            new_line = re.sub(r'HASH\((["\'])(.*?)\1\)', 'HASH({})'.format(_hash(r'\2')), line)
            out_handle.write(new_line)
        out_handle.close()

然而，当我运行它时，所有sha1哈希变得完全相同，这对我来说没有意义。如果不是写哈希，而是用HASH({}).format(r'\2')切换它，它将用双引号之间的字符序列替换它。那么为什么sha1哈希返回相同的字符串？

Answer 1

您正在计算字符串r'\2'的哈希值; re模块只会在您将其用作替换字符串时替换该占位符，但您不会在此处执行此操作。

使用替换函数从匹配对象传入组：

def replace_with_hash(match):
    return 'HASH({})'.format(_hash(match.group(2)))

new_line = re.sub(r'HASH\((["\'])(.*?)\1\)', replace_with_hash, line)

replace_with_hash()函数传递匹配对象，其返回值用作替换。现在你可以计算第二组的哈希值了！

演示：

>>> import re
>>> def _hash(string):
...     return 'HASHED: {}'.format(string[::-1])
... 
>>> sample = '''\
... HASH("<stuff>")
... '''
>>> re.sub(r'HASH\((["\'])(.*?)\1\)', 'HASH({})'.format(_hash(r'\2')), sample)
'HASH(HASHED: 2\\)\n'
>>> def replace_with_hash(match):
...     return 'HASH({})'.format(_hash(match.group(2)))
... 
>>> re.sub(r'HASH\((["\'])(.*?)\1\)', replace_with_hash, sample)
'HASH(HASHED: >ffuts<)\n'

我的_hash()函数只是反转输入字符串以显示发生的情况。

第一个re.sub()是你的版本;注意它如何返回'2\\'，所以r'\2'反转了！我的版本整齐地哈希<stuff>到>futts<。

获得与所有字符串相同的sha1哈希

1 个答案: