Question

我正在尝试在python中执行与下面的java代码相同的操作。

String decoded = new String("ä¸".getBytes("ISO8859_1"), "UTF-8");
System.out.println(decoded);

输出是中文字符串“中”。

在Python中我尝试了编码/解码/字节数据，但我总是得到不可读的字符串。我认为我的问题是我真的不明白java / python编码机制是如何工作的。此外，我无法从现有答案中找到解决方案。

#coding=utf-8

def p(s):
    print s + ' --  ' + str(type(s))

ch1 = 'ä¸-'
p(ch1)

chu1 = ch1.decode('ISO8859_1')
p(chu1.encode('utf-8'))

utf_8 = bytearray(chu1, 'utf-8')
p(utf_8)

p(utf_8.decode('utf-8').encode('utf-8'))

#utfstr = utf_8.decode('utf-8').decode('utf-8')
#p(utfstr)

p(ch1.decode('iso-8859-1').encode('utf8'))

ä¸- --  <type 'str'>
Ã¤Â¸Â- --  <type 'str'>
Ã¤Â¸Â- --  <type 'bytearray'>
Ã¤Â¸Â- --  <type 'str'>
Ã¤Â¸Â- --  <type 'str'>

Daniel Roseman的回答非常接近。谢谢。但就我的实际情况而言：

    ch = 'masanori harigae ã\201®ã\203\221ã\203¼ã\202½ã\203\212ã\203«ä¼\232è-°å®¤'
    print ch.decode('utf-8').encode('iso-8859-1')

我得到了

追踪（最近一次通话）：文件“”，第1行，in 文件“/apps/Python/lib/python2.7/encodings/utf_8.py”，第16行，解码 return codecs.utf_8_decode（input，errors，True） UnicodeDecodeError：'utf8'编解码器无法解码位置19的字节0x81：无效的起始字节

Java代码：

    String decoded = new String("masanori harigae ã\201®ã\203\221ã\203¼ã\202½ã\203\212ã\203«ä¼\232è-°å®¤".getBytes("ISO8859_1"), "UTF-8");
    System.out.println(decoded);

输出是masanori harigaeのパーソナル会 - 室

Answer 1

你这样做是错误的。你有一个错误编码为utf-8的字节串，你希望它被解释为iso-8859-1：

>>> ch = "ä¸"
>>> print u.decode('utf-8').encode('iso-8859-1')
中

在iso8859_1和utf-8之间转换Python字符串

Daniel Roseman的回答非常接近。谢谢。但就我的实际情况而言：

1 个答案: