我正在使用Python-3进行Unicode编码,并且发现了我无法理解的行为。
以下案件按预期进行:-
x = "A"
fo = open("test.txt","w",encoding="utf_8")
fo.write(x)
fo.close()
xxd -b test.txt
00000000:01000001(预期为1个字节)
x = "A"
fo = open("test.txt","w",encoding="utf_16_le")
fo.write(x)
fo.close()
xxd -b test.txt
00000000:01000001 00000000(如预期的2个字节)
x = "A"
fo = open("test.txt","w",encoding="utf_16_be")
fo.write(x)
fo.close()
xxd -b test.txt
00000000:00000000 01000001(如预期的2个字节)
为什么使用utf_16编码的4个字节? :-
我的理解是UTF-16是可变长度的字符编码,根据字符使用16位或32位。对于字符A
,它应仅使用16位。有人可以帮我了解这种行为吗?
x = "A"
fo = open("test.txt","w",encoding="utf_16")
fo.write(x)
fo.close()
xxd -b test.txt
00000000:11111111 11111110 01000001 00000000