Question

来自新手的一个小问题。我试图做一个小函数，它随机化文本的内容。

#-*- coding: utf-8 -*-
import random

def glitch(text):
    new_text = ['']
    for x in text:
        new_text.append(x)
        random.shuffle(new_text)
    return ''.join(new_text)

正如您所看到的那样，输入简单的字符串非常简单，输出就像'嘿，你好吗？'将导致预测的随机句子。但是，当我尝试粘贴类似于此的东西时：

打印故障（'Iàäï †n $§＆amp;0ñŒ≥Q¶μù`o¢y“-œº'）

... Python 2.7.9返回 '输入' 中不支持的字符 - 我已经浏览了论坛，并且根据我的理解尝试了一些事情，因为我一般都是新编码，但无济于事。

有什么建议吗？

感谢。

Answer 1

#-*- coding: utf-8 -*-
import random

def glitch(text):

    new_text = ['']
    for x in text:
        new_text.append(x)
        random.shuffle(new_text)
    return ''.join(new_text)

print (glitch(u'Iàäï†n$§&0ñŒ≥Q¶µù`o¢y”—œº'))

这应该可行，通过我自己的快速谷歌搜索，我发现，你必须在字母'u'之前加上，以将下面的文字标记为unicode。

来源：Unsupported characters in input

Answer 2

您的问题是Python 2.x - 而不是您的Python 2的特定版本.Python 2.x使用ascii而不是Unicode编码（在Python 3中更改），并且您的字符串（likley）编码为utf-8。见如下：

import chardet
text = 'Iàäï†n$§&0ñŒ≥Q¶µù`o¢y”—œº'
print chardet.detect(text)['encoding'] # prints utf-8

如果您下载Python 3.X，您的问题可能会得到解决，since UTF-8 can handle any Unicode code point。

如果您感兴趣 - 或未来2.x用户 - 您可以执行以下操作。

def glitch(text):
    new_text = []
    for x in text:
        new_text.append(x)
    random.shuffle(new_text) #note you should just shuffle once - not every iteration.
    new_line = ''.join(new_text) # this line is where your encoding moves from `utf-8` to `ascii`
    # this becomes `ascii` because of the empty string you use to join your list.  it defaults to `ascii`
    # if you tried to make it `unicode` by doing `u''.join(list)` you would get a `UnicodeDecodeError`
    return new_line.decode("ascii", "ignore").encode("utf-8") # note the [ignore][2].  it bypasses encoding errors.
    # now your code will run and return a string of utf-8 characters 
    # (to which we encode new_line, and which is the default encoding of a string anytime you `decode()` it.)
    # note that you will return a shorter string, because (again) `ascii` can only represent 
    # 128 characters by default, whereas some of your `utf-8` string is represented by 
    # characters b/w 129 & 255.

我希望这有帮助并且有意义。网上有很多材料讨论这个问题（包括我自己的多个问题 - for example :)）

输入中不支持的字符（Python 2.7.9）

2 个答案: