Question

我正在尝试将数组保存到文本文件中，但出现Unicode错误

df_duplicate = df[df['is_duplicate'] == 1]
dfp_nonduplicate = df[df['is_duplicate'] == 0]

# Converting 2d array of q1 and q2 and flatten the array: like {{1,2},{3,4}} to {1,2,3,4}
p = np.dstack([df_duplicate["question1"], df_duplicate["question2"]]).flatten()
n = np.dstack([dfp_nonduplicate["question1"], dfp_nonduplicate["question2"]]).flatten()

print ("Number of data points in class 1 (duplicate pairs) :",len(p))
print ("Number of data points in class 0 (non duplicate pairs) :",len(n))

#Saving the np array into a text file
np.savetxt('train_p.txt', p, delimiter=' ', fmt='%s')
np.savetxt('train_n.txt', n, delimiter=' ', fmt='%s')`

我知道我需要将其更改为utf-8格式，但是如何处理我无法理解的特定代码。还是python的初学者

Answer 1

从我将np.savetxt放入搜索引擎中找到的文档中：

numpy.savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='n', header='', footer='', comments='# ', encoding=None)
    Save an array to a text file.

是的，它确实有一个encoding参数。那就是您指定文件编码的地方。所以：

np.savetxt('train_p.txt', p, delimiter=' ', fmt='%s', encoding='utf-8')

说：所讨论的字符在您的文字中非常奇怪。有助于查看数据的来源。

UnicodeEncodeError：'charmap'编解码器无法对位置102中的字符'\ x85'进行编码：字符映射到<undefined>

1 个答案: