MD5哈希值会随着输入的增长而不断变化吗?

时间:2012-02-22 06:10:29

标签: mysql md5

MySQL的MD5哈希函数返回的值是否会无限期地继续变化,因为给它的字符串无限增长?

,例如,这些将继续返回不同的值:

MD5("A"+"B"+"C")
MD5("A"+"B"+"C"+"D")
MD5("A"+"B"+"C"+"D"+"E")
MD5("A"+"B"+"C"+"D"+"E"+"D")
... and so on until a very long list of values ....

在某些时候,当我们给函数提供非常长的输入字符串时,结果是否会停止变化,就像输入被截断一样?

我问,因为我想使用MD5函数通过存储这些字段的MD5哈希来比较两个记录和一大组字段。

========制作示例(你不需要回答这个问题但是你可能有兴趣:========

我有一个数据库应用程序,它定期从外部源获取数据并使用它来更新MySQL表。

让我们假设在第1个月,我首次下载:

downloaded data, where the first field is an ID, a key:
    1,"A","B","C"
    2,"A","D","E"
    3,"B","D","E"

I store this
    1,"A","B","C"
    2,"A","D","E"
    3,"B","D","E"

第二个月,我明白了         1, “A”, “B”, “C”         2, “A”, “d”, “X”         3, “B”, “d”, “E”         4, “B”, “F”, “E”

Notice that the record with ID 2 has changed.  Record with ID 4 is new.  So I store two new records:
    1,"A","B","C"
    2,"A","D","E"
    3,"B","D","E"
    2,"A","D","X"
    4,"B","F","E"

This way I have a history of *changes* to the data.

I don't want have to compare each field of the incoming data with each field of each of the stored records.
E.g., if I'm comparing incoming record x with exiting record a, I don't want to have to say:
    Add record x to the stored data if there is no record a such that x.ID == a.ID AND x.F1 == a.F1 AND x.F2 == a.F2 AND x.F3 == a.F3 [4 comparisons]

What I want to do is to compute an MD5 hash and store it:
    1,"A","B","C",MD5("A"+"B"+"C")

Let's suppose that it is month #3, and I get a record:
    1,"A","G","C"
What I want to do is compute the MD5 hash of the new fields: MD5("A"+"G"+"C") and compare the resulting hash with the hashes in the stored data.
If it doesn't match, then I add it as a new record.
I.e., Add record x to the stored data if there is no record a such that x.ID == a.ID AND MD5(x.F1 + x.F2 + x.F3) == a.stored_MD5_value [2 comparisons]

My question is "Can I compare the MD5 hash of, say, 50 fields without increasing the likelihood of clashes?"

2 个答案:

答案 0 :(得分:1)

是的,实际上,它应该不断变化。由于pigeonhole principle,如果你继续这样做,你最终应该发生碰撞,但是你达到这一点是不切实际的。

答案 1 :(得分:1)

MD5哈希函数的安全性严重受损。存在碰撞攻击,可以在具有2.6Ghz Pentium4处理器的计算机上发现碰撞(复杂性为2 24 )。 此外,还存在一种选择前缀冲突攻击,它可以使用现成的计算硬件(复杂性2 39 )在几小时内为两个选择的任意不同输入产生冲突。 通过使用现成的GPU,极大地帮助了发现冲突的能力。在NVIDIA GeForce 8400GS图形处理器上,可以计算出每秒16-18百万个哈希值。 NVIDIA GeForce 8800 Ultra每秒可以计算超过2亿个哈希值。

这些哈希和冲突攻击已在各种情况下在公众中得到证明,包括碰撞文档文件和数字证书。 见http://www.win.tue.nl/hashclash/On%20Collisions%20for%20MD5%20-%20M.M.J.%20Stevens.pdf

许多项目已在线发布MD5彩虹表,可用于将许多MD5哈希值反转为与原始输入冲突的字符串,通常用于密码破解。