通用函数如何在不知道字符集的情况下处理字符?

时间:2017-12-24 18:35:13

标签: java hash encoding character-encoding

我将sha-1作为我的问题的例子。据我所知,sha1是一个函数,它将一些数字作为输入,并产生另一个作为输出。

假设我们找到了角色ü的sha1值。这不是ASCII集中的字符。 Here是有关它的更多信息。 http://www.sha1-online.com/告诉我,此字符的sha1值为:94a759fd37735430753c7b6b80684306d80ea16e

UTF-8中的字符üC3 BC和UTF-16 00FC表示。那么,如果没有我们用于文本的字符编码,我们真的可以谈论sha1吗?例如http://www.sha1-online.com/有什么价值?字符编码不会产生巨大的差异吗?

据我所知,在Java中,所有字符都用UTF-16表示。当我在Java中计算sha-1时,对于上面的例子,该函数是否会对输入00FC进行处理?

让我们采用另一种编程语言,其中所有字符都由UTF-8表示。 sha1结果与Java完全不同吗?

4 个答案:

答案 0 :(得分:2)

您可以尝试不同的字符集编码器,看看它们有何不同。

import javax.xml.bind.DatatypeConverter;
import java.nio.charset.Charset;
import java.security.MessageDigest;
import java.util.Map;

public class Main {
    public static void main(String[] args) throws Exception {
        String s = "ü";
        MessageDigest sha1 = MessageDigest.getInstance("SHA1");
        for (Map.Entry<String, Charset> entry : Charset.availableCharsets().entrySet()) {
            try {
                byte[] encoded = s.getBytes(entry.getValue());
                byte[] digest = sha1.digest(encoded);
                System.out.printf("For encoding %s, SHA1 hash is %s%n",
                        entry.getKey(), DatatypeConverter.printHexBinary(digest));
            } catch (UnsupportedOperationException e) {
                System.out.printf("Cant make it work for %s%n", entry.getKey());
            }
        }
    }

}
在我的Mac上

输出是:

For encoding Big5, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding Big5-HKSCS, SHA1 hash is BB4DF74228C74A9F5B1CFADC9A711AFC3ACAC72E
For encoding CESU-8, SHA1 hash is 94A759FD37735430753C7B6B80684306D80EA16E
For encoding EUC-JP, SHA1 hash is 351B74F9485AEACE9E1F18FA834C01BBD95AEFFE
For encoding EUC-KR, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding GB18030, SHA1 hash is 8423E79957EB24D34D202D200DC8172062B09BC9
For encoding GB2312, SHA1 hash is 8423E79957EB24D34D202D200DC8172062B09BC9
For encoding GBK, SHA1 hash is 8423E79957EB24D34D202D200DC8172062B09BC9
For encoding IBM-Thai, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding IBM00858, SHA1 hash is A3F294235FE5422005AE9BC3A0D1BFFE12CFE353
For encoding IBM01140, SHA1 hash is 5FB9A0BA37519B7FD51909C778EE3B48502DE7C1
For encoding IBM01141, SHA1 hash is 655F2B71DDFAFBCBD5AF517F02EB9386A2A7A2A1
For encoding IBM01142, SHA1 hash is EB6B0E7165A8118B4BD2DE93FBE8182DC50FE8DE
For encoding IBM01143, SHA1 hash is EB6B0E7165A8118B4BD2DE93FBE8182DC50FE8DE
For encoding IBM01144, SHA1 hash is 5FB9A0BA37519B7FD51909C778EE3B48502DE7C1
For encoding IBM01145, SHA1 hash is 5FB9A0BA37519B7FD51909C778EE3B48502DE7C1
For encoding IBM01146, SHA1 hash is 5FB9A0BA37519B7FD51909C778EE3B48502DE7C1
For encoding IBM01147, SHA1 hash is 5FB9A0BA37519B7FD51909C778EE3B48502DE7C1
For encoding IBM01148, SHA1 hash is 5FB9A0BA37519B7FD51909C778EE3B48502DE7C1
For encoding IBM01149, SHA1 hash is 5FB9A0BA37519B7FD51909C778EE3B48502DE7C1
For encoding IBM037, SHA1 hash is 5FB9A0BA37519B7FD51909C778EE3B48502DE7C1
For encoding IBM1026, SHA1 hash is C2204EDBFB1B72C9E996A5E6464F6AB0198C494F
For encoding IBM1047, SHA1 hash is 5FB9A0BA37519B7FD51909C778EE3B48502DE7C1
For encoding IBM273, SHA1 hash is 655F2B71DDFAFBCBD5AF517F02EB9386A2A7A2A1
For encoding IBM277, SHA1 hash is EB6B0E7165A8118B4BD2DE93FBE8182DC50FE8DE
For encoding IBM278, SHA1 hash is EB6B0E7165A8118B4BD2DE93FBE8182DC50FE8DE
For encoding IBM280, SHA1 hash is 5FB9A0BA37519B7FD51909C778EE3B48502DE7C1
For encoding IBM284, SHA1 hash is 5FB9A0BA37519B7FD51909C778EE3B48502DE7C1
For encoding IBM285, SHA1 hash is 5FB9A0BA37519B7FD51909C778EE3B48502DE7C1
For encoding IBM290, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding IBM297, SHA1 hash is 5FB9A0BA37519B7FD51909C778EE3B48502DE7C1
For encoding IBM420, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding IBM424, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding IBM437, SHA1 hash is A3F294235FE5422005AE9BC3A0D1BFFE12CFE353
For encoding IBM500, SHA1 hash is 5FB9A0BA37519B7FD51909C778EE3B48502DE7C1
For encoding IBM775, SHA1 hash is A3F294235FE5422005AE9BC3A0D1BFFE12CFE353
For encoding IBM850, SHA1 hash is A3F294235FE5422005AE9BC3A0D1BFFE12CFE353
For encoding IBM852, SHA1 hash is A3F294235FE5422005AE9BC3A0D1BFFE12CFE353
For encoding IBM855, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding IBM857, SHA1 hash is A3F294235FE5422005AE9BC3A0D1BFFE12CFE353
For encoding IBM860, SHA1 hash is A3F294235FE5422005AE9BC3A0D1BFFE12CFE353
For encoding IBM861, SHA1 hash is A3F294235FE5422005AE9BC3A0D1BFFE12CFE353
For encoding IBM862, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding IBM863, SHA1 hash is A3F294235FE5422005AE9BC3A0D1BFFE12CFE353
For encoding IBM864, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding IBM865, SHA1 hash is A3F294235FE5422005AE9BC3A0D1BFFE12CFE353
For encoding IBM866, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding IBM868, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding IBM869, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding IBM870, SHA1 hash is 5FB9A0BA37519B7FD51909C778EE3B48502DE7C1
For encoding IBM871, SHA1 hash is 5FB9A0BA37519B7FD51909C778EE3B48502DE7C1
For encoding IBM918, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
Cant make it work for ISO-2022-CN
For encoding ISO-2022-JP, SHA1 hash is EEFF680379A9FC2E2328A673C1C9A9488027DDE6
For encoding ISO-2022-JP-2, SHA1 hash is 7F2ABFCFEE137EAF0E691FF15303B2E49FA2F10F
For encoding ISO-2022-KR, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding ISO-8859-1, SHA1 hash is AB461F6B8A6842A473257A2561C1FBDF91BDFE77
For encoding ISO-8859-13, SHA1 hash is AB461F6B8A6842A473257A2561C1FBDF91BDFE77
For encoding ISO-8859-15, SHA1 hash is AB461F6B8A6842A473257A2561C1FBDF91BDFE77
For encoding ISO-8859-2, SHA1 hash is AB461F6B8A6842A473257A2561C1FBDF91BDFE77
For encoding ISO-8859-3, SHA1 hash is AB461F6B8A6842A473257A2561C1FBDF91BDFE77
For encoding ISO-8859-4, SHA1 hash is AB461F6B8A6842A473257A2561C1FBDF91BDFE77
For encoding ISO-8859-5, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding ISO-8859-6, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding ISO-8859-7, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding ISO-8859-8, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding ISO-8859-9, SHA1 hash is AB461F6B8A6842A473257A2561C1FBDF91BDFE77
For encoding JIS_X0201, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding JIS_X0212-1990, SHA1 hash is 1636827A2EED870EE75B8646595EA7FA833B7B2D
For encoding KOI8-R, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding KOI8-U, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding Shift_JIS, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding TIS-620, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding US-ASCII, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding UTF-16, SHA1 hash is 884B39989F49BC0C3B4095E564B97F788E8F26A4
For encoding UTF-16BE, SHA1 hash is 4BB28530F49234022C33A9A53020019FF1729128
For encoding UTF-16LE, SHA1 hash is A497384499A29B7E56BEC88F64915F8697B9F212
For encoding UTF-32, SHA1 hash is 2D15E32FE6E8B72CC758BF92826781A21F543F06
For encoding UTF-32BE, SHA1 hash is 2D15E32FE6E8B72CC758BF92826781A21F543F06
For encoding UTF-32LE, SHA1 hash is 42670183E5B0D4ED60120ABB18E4B19458B8786D
For encoding UTF-8, SHA1 hash is 94A759FD37735430753C7B6B80684306D80EA16E
For encoding windows-1250, SHA1 hash is AB461F6B8A6842A473257A2561C1FBDF91BDFE77
For encoding windows-1251, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding windows-1252, SHA1 hash is AB461F6B8A6842A473257A2561C1FBDF91BDFE77
For encoding windows-1253, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding windows-1254, SHA1 hash is AB461F6B8A6842A473257A2561C1FBDF91BDFE77
For encoding windows-1255, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding windows-1256, SHA1 hash is AB461F6B8A6842A473257A2561C1FBDF91BDFE77
For encoding windows-1257, SHA1 hash is AB461F6B8A6842A473257A2561C1FBDF91BDFE77
For encoding windows-1258, SHA1 hash is AB461F6B8A6842A473257A2561C1FBDF91BDFE77
For encoding windows-31j, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-Big5-HKSCS-2001, SHA1 hash is BB4DF74228C74A9F5B1CFADC9A711AFC3ACAC72E
For encoding x-Big5-Solaris, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-COMPOUND_TEXT, SHA1 hash is AB461F6B8A6842A473257A2561C1FBDF91BDFE77
For encoding x-euc-jp-linux, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-EUC-TW, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-eucJP-Open, SHA1 hash is 351B74F9485AEACE9E1F18FA834C01BBD95AEFFE
For encoding x-IBM1006, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM1025, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM1046, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM1097, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM1098, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM1112, SHA1 hash is 5FB9A0BA37519B7FD51909C778EE3B48502DE7C1
For encoding x-IBM1122, SHA1 hash is EB6B0E7165A8118B4BD2DE93FBE8182DC50FE8DE
For encoding x-IBM1123, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM1124, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM1166, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM1364, SHA1 hash is 7A81AF3E591AC713F81EA1EFE93DCF36157D8376
For encoding x-IBM1381, SHA1 hash is 8423E79957EB24D34D202D200DC8172062B09BC9
For encoding x-IBM1383, SHA1 hash is 8423E79957EB24D34D202D200DC8172062B09BC9
For encoding x-IBM300, SHA1 hash is 0AD631FE7C0AFBB8E46DFF643ECB6F157F0F17C2
For encoding x-IBM33722, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM737, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM833, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM834, SHA1 hash is BF465657E801DC6DEC070496C4CD3BE6C9463310
For encoding x-IBM856, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM874, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM875, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM921, SHA1 hash is AB461F6B8A6842A473257A2561C1FBDF91BDFE77
For encoding x-IBM922, SHA1 hash is AB461F6B8A6842A473257A2561C1FBDF91BDFE77
For encoding x-IBM930, SHA1 hash is 7A81AF3E591AC713F81EA1EFE93DCF36157D8376
For encoding x-IBM933, SHA1 hash is 7A81AF3E591AC713F81EA1EFE93DCF36157D8376
For encoding x-IBM935, SHA1 hash is CC6D81CF2D2718EEBE6B8AAC261DF04090159565
For encoding x-IBM937, SHA1 hash is 7A81AF3E591AC713F81EA1EFE93DCF36157D8376
For encoding x-IBM939, SHA1 hash is 7A81AF3E591AC713F81EA1EFE93DCF36157D8376
For encoding x-IBM942, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM942C, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM943, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM943C, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM948, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM949, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM949C, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM950, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM964, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-IBM970, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-ISCII91, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-ISO-2022-CN-CNS, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-ISO-2022-CN-GB, SHA1 hash is 187F00566A714B33F37B77D53ECA6E20CD74DDE0
For encoding x-iso-8859-11, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-JIS0208, SHA1 hash is 3A2C82466E34A4A1677205899068A2D53A92BD54
Cant make it work for x-JISAutoDetect
For encoding x-Johab, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-MacArabic, SHA1 hash is F195C020A28DFC5F2FB6AF256B524DDCD93756ED
For encoding x-MacCentralEurope, SHA1 hash is F195C020A28DFC5F2FB6AF256B524DDCD93756ED
For encoding x-MacCroatian, SHA1 hash is F195C020A28DFC5F2FB6AF256B524DDCD93756ED
For encoding x-MacCyrillic, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-MacDingbat, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-MacGreek, SHA1 hash is F195C020A28DFC5F2FB6AF256B524DDCD93756ED
For encoding x-MacHebrew, SHA1 hash is F195C020A28DFC5F2FB6AF256B524DDCD93756ED
For encoding x-MacIceland, SHA1 hash is F195C020A28DFC5F2FB6AF256B524DDCD93756ED
For encoding x-MacRoman, SHA1 hash is F195C020A28DFC5F2FB6AF256B524DDCD93756ED
For encoding x-MacRomania, SHA1 hash is F195C020A28DFC5F2FB6AF256B524DDCD93756ED
For encoding x-MacSymbol, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-MacThai, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-MacTurkish, SHA1 hash is F195C020A28DFC5F2FB6AF256B524DDCD93756ED
For encoding x-MacUkraine, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-MS932_0213, SHA1 hash is 9B4A2DBFED88AFE0BD16BD7CAD69F90BE5A09FAF
For encoding x-MS950-HKSCS, SHA1 hash is BB4DF74228C74A9F5B1CFADC9A711AFC3ACAC72E
For encoding x-MS950-HKSCS-XP, SHA1 hash is BB4DF74228C74A9F5B1CFADC9A711AFC3ACAC72E
For encoding x-mswin-936, SHA1 hash is 8423E79957EB24D34D202D200DC8172062B09BC9
For encoding x-PCK, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-SJIS_0213, SHA1 hash is 9B4A2DBFED88AFE0BD16BD7CAD69F90BE5A09FAF
For encoding x-UTF-16LE-BOM, SHA1 hash is 61FB9BF626098B2786735AA4505430890DCC6BC8
For encoding X-UTF-32BE-BOM, SHA1 hash is 0662CE1CEA946124D3FA5F43B4BA2DA41CEF500C
For encoding X-UTF-32LE-BOM, SHA1 hash is A6030F4A113F71489180D342DDB5106CE9FC33E5
For encoding x-windows-50220, SHA1 hash is 7F2ABFCFEE137EAF0E691FF15303B2E49FA2F10F
For encoding x-windows-50221, SHA1 hash is 7F2ABFCFEE137EAF0E691FF15303B2E49FA2F10F
For encoding x-windows-874, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-windows-949, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-windows-950, SHA1 hash is 5BAB61EB53176449E25C2C82F172B82CB13FFB9D
For encoding x-windows-iso2022jp, SHA1 hash is EEFF680379A9FC2E2328A673C1C9A9488027DDE6

答案 1 :(得分:1)

哈希函数将字节数组作为输入。您将字符或字符串转换为字节数组,并在转换中明确或隐式地指定字符集。字节值不一定与字符数值相同,这取决于字符集。 通常,通用函数需要知道字符集,但是可以隐式地提供该信息。

答案 2 :(得分:1)

正如提到的其他答案,SHA-1校验和是从字节计算的。但是当您使用Java或http://www.sha1-online.com等实用程序时,字符编码会起作用,因为不同的字符集用于表示文本。上面提到的在线实用程序混淆了该问题,因为它没有指定它用于计算值的字符集。

通常,当您使用内置系统实用程序(如shasum(UNIX)或certutil(Windows))计算文件上的SHA-1时,它只读取文件的字节。如果您使用文本编辑器并保存文件,您会注意到它允许您选择编码。 NotePad具有ANSI和UTF-8选项。如果要将文本作为ANSI保存在一个文件中,并将另一个文件中的文本保存为UTF-8,则它们会生成不同的SHA-1校验和,因为用于表示字符的字节不同。

答案 3 :(得分:0)

  

据我所知,在Java中,所有字符都用UTF-16表示。   Java中的字符串是16位数字的序列,代表UTF-16代码单元。

散列函数适用于字节序列。 Java中的字符串不是字节序列。因此,要将字符串传递给散列函数,必须首先将其转换为字节序列。有些语言可能会让你隐式地将16位数字序列重新解释为一个字节序列,但Java并不那么草率。

  

当我在Java中计算sha-1时,对于上面的例子,该函数是否会在输入00FC上工作?

如果您使用链接的Java示例代码,几乎肯定不会。

您链接到的java示例使用String.getBytes()来执行此转换。 getBytes()不会简单地将单词序列重新解释为字节序列。相反,它根据&#34;平台的默认字符集&#34;将UTF-16代码单元序列转换为字节序列。

&#34;平台的默认字符集&#34;将根据平台和设置该平台的人选择的语言设置而有所不同,但它将是一个基于字节的ASCII字符集,而不是UTF-16。