来自新手的一个小问题。我试图做一个小函数,它随机化文本的内容。
#-*- coding: utf-8 -*-
import random
def glitch(text):
new_text = ['']
for x in text:
new_text.append(x)
random.shuffle(new_text)
return ''.join(new_text)
正如您所看到的那样,输入简单的字符串非常简单,输出就像'嘿,你好吗?'将导致预测的随机句子。但是,当我尝试粘贴类似于此的东西时:
打印故障('Iàäï †n $§&0ñŒ≥Q¶μù`o¢y“-œº')
... Python 2.7.9返回 '输入' 中不支持的字符 - 我已经浏览了论坛,并且根据我的理解尝试了一些事情,因为我一般都是新编码,但无济于事。
有什么建议吗?
感谢。
答案 0 :(得分:0)
#-*- coding: utf-8 -*-
import random
def glitch(text):
new_text = ['']
for x in text:
new_text.append(x)
random.shuffle(new_text)
return ''.join(new_text)
print (glitch(u'Iàäï†n$§&0ñŒ≥Q¶µù`o¢y”—œº'))
这应该可行,通过我自己的快速谷歌搜索,我发现,你必须在字母'u'之前加上,以将下面的文字标记为unicode。
答案 1 :(得分:-1)
您的问题是Python 2.x - 而不是您的Python 2的特定版本.Python 2.x使用ascii
而不是Unicode编码(在Python 3中更改),并且您的字符串(likley)编码为utf-8
。见如下:
import chardet
text = 'Iàäï†n$§&0ñŒ≥Q¶µù`o¢y”—œº'
print chardet.detect(text)['encoding'] # prints utf-8
如果您下载Python 3.X,您的问题可能会得到解决,since UTF-8 can handle any Unicode code point。
如果您感兴趣 - 或未来2.x用户 - 您可以执行以下操作。
def glitch(text):
new_text = []
for x in text:
new_text.append(x)
random.shuffle(new_text) #note you should just shuffle once - not every iteration.
new_line = ''.join(new_text) # this line is where your encoding moves from `utf-8` to `ascii`
# this becomes `ascii` because of the empty string you use to join your list. it defaults to `ascii`
# if you tried to make it `unicode` by doing `u''.join(list)` you would get a `UnicodeDecodeError`
return new_line.decode("ascii", "ignore").encode("utf-8") # note the [ignore][2]. it bypasses encoding errors.
# now your code will run and return a string of utf-8 characters
# (to which we encode new_line, and which is the default encoding of a string anytime you `decode()` it.)
# note that you will return a shorter string, because (again) `ascii` can only represent
# 128 characters by default, whereas some of your `utf-8` string is represented by
# characters b/w 129 & 255.
我希望这有帮助并且有意义。网上有很多材料讨论这个问题(包括我自己的多个问题 - for example :))