Question

我有以下代码，它适用于python 3.5，但是当我尝试使用python 2.7运行它时，它显示错误。

这是代码：

def load_data_and_labels():
    # Load data from files
    with codecs.open('./data/train.txt',encoding="utf8") as inf:
        reader = csv.reader(inf, delimiter='\t',quoting=csv.QUOTE_NONE)
        col = list(zip(*reader)) # <--- The error appeared here.
        x_text = col[2]
        colY = col[1]
    # Split by words
    x_text = [clean_str(sent) for sent in x_text]
    x_text = [s.split(" ") for s in x_text]
    # Generate labels
    y = [[1,0] if int(x)==1 else [0,1] for x in colY]
    y = np.array(y)
    return [x_text, y]

UnicodeEncodeError：＆＃39; ascii＆＃39;编解码器无法对字符u＆＃39; \ ufe0f＆＃39;进行编码。在   位置120：序数不在范围内（128）

col = list（zip（* reader））

这是文本文件结构＆＃34; sample＆＃34;：

3   1   Hey there! Nice to see you Minnesota/ND Winter Weather 
4   0   3 episodes left I'm dying over here
5   1   "I can't breathe!" was chosen as the most notable quote of the year

Answer 1

如果您花时间并简单搜索Python 2和Python 3之间的差异，您会发现最大的变化之一是unicode支持，因为在Python 3中，字符串默认为unicode

因此，如果你有一个包含unicode字符的文件，并且你试图在Python 2中对它们进行表示而不需要特别小心，那么它将失败，因为默认值将被转换为标准的ascii。

如果将此与（引用CSV文件阅读器模块的文档https://docs.python.org/2/library/csv.html）The csv module doesn’t directly support reading and writing Unicode这一事实相结合，您就会明白为什么这样做不起作用。

您可以在此处查看：https://wiki.python.org/moin/Python2orPython3

Answer 2

csv模块的那个版本不支持Unicode输入，请参阅此处的注释：

https://docs.python.org/2/library/csv.html

UnicodeEncodeError：＆＃39; ascii＆＃39;编解码器无法对字符u＆＃39; \ ufe0f＆＃39;，zip函数

2 个答案: