我能够弄清楚如何使用以下代码将Unicode字符串转换为ASCII字符串。 (学分在代码中)
//create a string using unicode that says "hello" when printed to console
String unicode = "\u0068" + "\u0065" + "\u006c" + "\u006c" + "\u006f";
System.out.println(unicode);
System.out.println("");
/* Test code for converting unicode to ASCII
* Taken from http://stackoverflow.com/questions/15356716/how-can-i-convert-unicode-string-to-ascii-in-java
* Will be commented out later after tested and implemented.
*/
//String s = "口水雞 hello Ä";
//replace String s with String unicode for conversion
String s1 = Normalizer.normalize(unicode, Normalizer.Form.NFKD);
String regex = Pattern.quote("[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+");
String s2 = new String(s1.replaceAll(regex, "").getBytes("ascii"), "ascii");
System.out.println(s2);
System.out.println(unicode.length() == s2.length());
//End of Test code that was implemented
现在,我的问题和好奇心已经变得更好了。我一直在谷歌上搜索,因为我对Java没有最好的了解。
我的问题是,是否可以将ASCII字符串转换为UTF格式?特别是UTF-16。 (我说UTF-16因为我知道UTF-8与ASCII有多相似,所以没有必要从ASCII转换为UTF-8)
提前致谢!
答案 0 :(得分:1)
Java字符串使用UTF-16作为内部格式,它不相关,因为String
类负责处理它。您将在两种情况下看到差异:
String
检查为字节数组(见下文)。这种情况一直发生在C语言中,但对于更现代的语言而言并非如此,并且在字符串和字节数组之间有适当的区别(例如Java或Python 3.x)。如果要在写入文件(或等效文件)之前将内容编码为UTF-16,可以使用:
String data = "TEST";
OutputStream output = new FileOutputStream("filename.txt");
output.write(data.getBytes("UTF-16"));
output.close();
结果文件将包含:
0000000: feff 0054 0045 0053 0054 ...T.E.S.T
开头的BOM字节是UTF-16。