Question

我一直在：

UnicodeEncodeError: 'ascii' codec can't encode characters in position 265-266: ordinal not in range(128)

当我尝试：

df.to_html("mypage.html")

以下是如何重现问题的示例：

df = pd.DataFrame({"a": [u'Rue du Gu\xc3\xa9, 78120 Sonchamp'], "b": [u"some other thing"]})
df.to_html("mypage.html")

"a"中的元素列表属于"unicode"类型。

当我想将它导出到csv时它可以工作，因为你可以这样做：

df.to_csv("myfile.csv", encoding="utf-8")

Answer 1

您的问题出在其他代码中。您的示例代码的Unicode字符串被错误解码为latin1，Windows-1252或类似字符串，因为它中包含UTF-8序列。在这里，我解除了错误的解码并重新编码为UTF-8，但是你想要找到错误解码的执行位置：

>>> s = u'Rue du Gu\xc3\xa9, 78120 Sonchamp'
>>> s.encode('latin1').decode('utf8')
u'Rue du Gu\xe9, 78120 Sonchamp'
>>> print(s.encode('latin1').decode('utf8'))
Rue du Gué, 78120 Sonchamp

Answer 2

它为我工作的方式：

html = df.to_html()

with open("dataframe.html", "w", encoding="utf-8") as file:
    file.writelines('<meta charset="UTF-8">\n')
    file.write(html)

Answer 3

如果你真的需要将输出保持为html，你可以尝试在写入to_html之前清理numpy数组中的代码。

df = pd.DataFrame({"a": [u'Rue du Gu\xc3\xa9, 78120 Sonchamp'], "b": [u"some other thing"]})

def clean_unicode(df):
   *#Transforms the DataFrame to Numpy array*
   df=df.as_matrix()
   *#Encode all strings with special characters* 
   for x in np.nditer(df, flags=['refs_ok'], op_flags =['copy', 'readonly']):
         df[df==x]=str(str(x).encode("latin-1", "replace").decode('utf8'))
   *#Transform the Numpy array to Dataframe again*
   df=pd.DataFrame(df)
   return df

df=clean_unicode(df)
df.to_html("Results.html') -----> Success!

Answer 4

问题实际上出在使用df.to_html("mypage.html")直接将HTML保存到文件中。相反，如果您自己编写文件，则可以避免使用熊猫编码错误。

html = df.to_html()
with open("mypage.html", "w", encoding="utf-8") as file:
    file.write(html)

您可能还需要在HTML的开头指定字符集，以使其在某些浏览器中正确显示（HTML5默认为UTF-8）：

<meta charset="UTF-8">

这是我所见过的唯一对我有用的方法。

如何使用utf-8编码将DataFrame导出为Html？

4 个答案: