我正在建立一个西班牙语的Flask网站,供人们通过邮件编码的邮件发送。基本上,您将文本粘贴到文本字段中,然后返回其编码版本。函数下面的函数encode()和decode()函数确定,直到它处理强调的和其他非标准字符。我的默认系统编码是'ascii',我相信我可能会使用numpy.matrix和numpy.chararray来改变我的字符串编码。
当我在Sublime Text 2中构建代码并进行测试时,我得到了一个:
SyntaxError: Non-ASCII character '\xc3'... but no encoding declared;
see http://www.python.org/peps/pep-0263.html for details
当我添加
#!/usr/bin/env python
#-*- coding: utf-8 -*-
它在ST2中运行的代码,但是它也会发出错误并且解码的消息缺少某些字符,如下所示:
[Decode error - output not utf-8]
La cr a del le n tiene dos a os.
当我在Flask的本地服务器上运行时,我得到:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xed' in position 0: ordinal not in range(128)
我尝试了chardet包,矩阵中的项目被识别为'windows-1252'。我使用'windows-1252'和'cp1252'解码矩阵中的项目,但问题仍然存在。我尝试在前一次解码后使用'utf-8'进行编码(即使用'windows-1252'),但它不起作用。我怀疑这是一个编码问题,但我不完全确定。如何解决这个问题的任何线索都非常感激。
这是代码:
import numpy as np
import random, string, re
def encode(message, size, token):
"""Assumes message is a string, size is the size limit of the message,
and token is a string with unique characters, i.e. bufalo but not rana"""
message = list(message)
while len(message) < size:
sgn = random.choice(['*', '?', '&', '@'])
message.append(sgn)
matrix = np.matrix(message)
cols = size/5
matrix = matrix.reshape((cols, 5)).T
encoded = np.chararray(shape=(cols,5)).T
token = token.lower()
token = list(token)
new = []
for i in token:
new.append(sorted(token).index(i))
while len(new) > 5:
for i in new:
if i >= (5):
new.remove(i)
old = range(0,5)
for o, n in zip(old, new):
encoded[np.ix_([n], range(0, matrix.shape[1]))] = matrix[np.ix_([o], range(0, matrix.shape[1]))]
encoded_str = ''
for i in range((encoded.size)):
encoded_str += encoded.item(i)
return encoded_str
#########################################
#THIS IS A TEST
#########################################
mssg = "La cría del león tiene dos años."
print encode(mssg, 120, 'bufalo')
#########################################
def decode(message, size, token):
message = list(message)
while len(message) < size:
sgn = random.choice(['*', '?', '&', '@'])
message.append(sgn)
matrix = np.matrix(message)
cols = size/5
matrix = matrix.reshape((5, cols))
token = token.lower()
token = list(token)
new = []
for i in token:
new.append(sorted(token).index(i))
while len(new) > 5:
for i in new:
if i >= (5):
new.remove(i)
old = range(0,5)
decoded = np.chararray(shape=(cols,5)).T
for n, o in zip(old, new):
decoded[np.ix_([n], range(0, matrix.shape[1]))] = matrix[np.ix_([o], range(0, matrix.shape[1]))]
decoded =decoded.T
decoded_str = ''
for i in range((decoded.size)):
decoded_str += decoded.item(i)
decoded_str = re.sub('[^a-zA-Z0-9\n\.]', ' ', decoded_str)
return decoded_str
答案 0 :(得分:0)
修复代码需要做几件事
1)由于您的代码包含unicode字符,因此添加#-*- coding: utf-8 -*-
2)测试字符串应该是unicode字符串。所以该行应该成为
mssg = u"La cría del león tiene dos años."
3)encoded
数组(来自行encoded = np.chararray(shape=(cols,5)).T
)默认为ascii string。您应该将行更改为
encoded = np.chararray(shape=(cols,5), unicode=true).T
即。您需要添加参数 unicode=true
然后代码将运行并打印此结果
lt a?@&*@*&&&*&*&*?&?*Lílnnss&*@&&*@&??&?&@**?aa e .@?*@&&@?@?*@?@?&?cdeidñ*&??&?**@*@*@&*&?@reóeoo&**&?@?&&??&@@??&&