Question

我有一个包含以下内容的文本文件：

....     
{"emojiCharts":{"emoji_icon":"\u2697","repost": 3, "doc": 3, "engagement": 1184, "reach": 6734, "impression": 44898}}
{"emojiCharts":{"emoji_icon":"\U0001f924","repost": 11, "doc": 11, "engagement": 83, "reach": 1047, "impression": 6981}}
....

有些表情符号为\uhhhh格式，有些表情符号为\Uhhhhhhhh。

是否存在任何将其编码/解码以显示表情符号的方式？因为如果文件仅包含\Uhhhhhhhh，则一切正常。

要进入此阶段，我已通过以下方式修改了文件：

insightData.decode("raw_unicode_escape").encode('utf-16', 'surrogatepass').decode('utf-16').encode("raw_unicode_escape").decode("latin_1")

我需要使用表情符号来显示表情符号

insightData.decode("raw_unicode_escape").encode('utf-16', 'surrogatepass').decode('utf-16')

但是显示错误：

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2600' in position 30: ordinal not in range(128)

解决方案：

with open(OUTPUT, "r") as infileInsight:
    insightData = infileInsight.read()\
    .decode('raw_unicode_escape')

with open(OUTPUT, "w+") as outfileInsight:
    outfileInsight.write(insightData.encode('utf-8'))

Answer 1

您可以执行此操作。

print a["emojiCharts"]["emoji_icon"].decode("unicode-escape")

输出： ⚗

Answer 2

这与UTF-8或UTF-16没有关系。通常，这只是Python的转义Unicode字符的方式，U + FFFF以下的所有内容都使用\uFFFF，而上方的所有内容都使用\UFFFFFFFF（出于历史原因）。

两个转义序列在Python字符串中应完全相同。在我的计算机上，使用@vks的解决方案：

$ python
Python 2.7.15rc1 (default, Apr 15 2018, 21:51:34)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> '\U0000ABCD'.decode('unicode-escape')
u'\uabcd'
>>> '\uABCD'.decode('unicode-escape')
u'\uabcd'

以及类似的Python 3。

Answer 3

好。 Python 2.7，Win 10。

您的原始文件是纯ASCII码，包含确切的Unicode转义符（“ \ u ####”为6个字节，而“ \ U ########”为10个字节）。

读取文件并使用'unicode-escape'解码：然后您有了一个Python unicode字符串；我们称之为your_unicode_string。

要写入文件，请选择：

output_encoding = 'utf-8'

或

output_encoding = 'utf-16-le'

然后：

import codecs
with codecs.open(output_filename, 'w', encoding=output_encoding) as fpo:
    # fpo.write(u'\ufeff') # for windows, you might want to write this at the start
    fpo.write(your_unicode_string)

对于给定的python和os版本，如果没有任何篡改，您将只能print进入控制台并查看表情符号。

表情符号，当文本文件包含utf-8和utf-16时进行编码/解码

3 个答案: