Question

我写这个函数用于检查uicode字符串中是否存在月份，替换它与月份数。我在标题

中使用此编码

`#!/usr/bin/python
# -*- coding: utf-8 -*-`

这是我转换月份的def

def changeData(date):
                if date:
                   date.encode('utf-8')
                    if "فروردین".encode('utf-8') in date:
                        return str.replace(":فروردین", ":1")
                    elif "اردیبهشت".encode('utf-8') in date:
                        return str.replace(":اردیبهشت", ":2")
                    elif "خرداد".encode('utf-8') in date:
                        return str.replace(":خرداد", ":3")
                    elif "تیر".encode('utf-8') in date:
                        return str.replace(":تیر", ":41")
                    elif "مرداد".encode('utf-8') in date:
                        return str.replace(":مرداد", ":5")
                    elif "شهریور".encode('utf-8') in date:
                        return str.replace(":شهریور", ":6")
                    elif "مهر".encode('utf-8') in date:
                        return str.replace(":مهر", ":7")
                    elif "آبان".encode('utf-8') in date:
                        return str.replace(":آبان", ":8")
                    elif "آذر".encode('utf-8') in date:
                        return str.replace(":آذر", ":9")
                    elif "دی".encode('utf-8') in date:
                        return str.replace(":دی", ":10")
                    elif "بهمن".encode('utf-8') in date:
                        return str.replace(":بهمن", ":11")
                    elif "اسفند".encode('utf-8') in date:
                        return str.replace(":اسفند", ":12")

我在函数中使用unicode格式传递日期然后将其转换为encode('utf-8')但是给我这个错误

if "فروردین".encode('utf-8') in date:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd9 in position 0: ordinal not in range(128)

我如何解决这个问题

Answer 1

我假设Python 2.7。

所以：

"فروردین".encode('utf-8') # UnicodeDecodeError: 'ascii' codec can't decode byte 0xd9 in position 0: ordinal not in range(128)

问题在于Python 2.7字符串是字节：

print(repr("فروردین")) # '\xd9\x81\xd8\xb1\xd9\x88\xd8\xb1\xd8\xaf\xdb\x8c\xd9\x86'

使用以下代码：

"فروردین".encode('utf-8')

你正在尝试编码逻辑错误的字节，因为：

ENCODING: unicode --> bytes 
DECODING: bytes --> unicode

但Python并没有像TypeError那样抛出smth，因为Python很聪明在这种情况下，它首先尝试将给定的字节解码为unicode，然后执行用户指定的编码问题是Python在Python 2中使用默认编码ASCII执行所描述的解码。因此程序终止于UnicodeDecodeError。

所描述的解码类似于：

unicode("فروردین") # UnicodeDecodeError: 'ascii' codec can't decode byte 0xd9 in position 0: ordinal not in range(128)

所以，你不应该对byte-string进行编码，你必须 DECODE 才能接收unicode：

u = "فروردین".decode('utf-8') 
print(type(u)) # <type 'unicode'>

获取unicode的另一种方法是使用u - 文字+编码声明：

# coding: utf-8

u = u"فروردین"
print(type(u)) # <type 'unicode'> 

print(u == "فروردین".decode('utf-8')) # True

在字符串中找到阿拉伯语单词字符串会出现错误＆＃39; ascii＆＃39;编解码器无法解码

1 个答案: