Question

我的IDE无法选择unicode的引用，但它也没有抛出任何错误（显然因为它是python库的一部分）。但是现在我想把这个函数重新定义为我自己的函数，当我将这个函数复制粘贴到我的文件中时，＃unicode＆＃34;无法识别，并抛出编译错误。有没有人知道unicode听到的是什么？

def text_to_word_sequence(text,
                          filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
                          lower=True, split=" "):
    """Converts a text to a sequence of words (or tokens).

    # Arguments
        text: Input text (string).
        filters: Sequence of characters to filter out.
        lower: Whether to convert the input to lowercase.
        split: Sentence split marker (string).

    # Returns
        A list of words (or tokens).
    """
    if lower:
        text = text.lower()

    if sys.version_info < (3,) and isinstance(text, unicode):
        translate_map = dict((ord(c), unicode(split)) for c in filters)
    else:
        translate_map = maketrans(filters, split * len(filters))

    text = text.translate(translate_map)
    seq = text.split(split)
    return [i for i in seq if i]

Answer 1

unicode是2.x中的一个类型，它引用字符串而不是字节（str）。 3.x中的等价物是str（而不是bytes）。

只需删除2.x代码路径，该代码就可以了（当然除了错误）。

什么是＆＃34; unicode＆＃34;文本包中的text_to_word_sequence（）中提到了什么？

1 个答案: