我正在尝试读取一个csv文件,其中包含一些具有Unicode字符的行())。熊猫无法处理这些角色。
在MS excel中打开时,行看起来像这样
列
age;"job";"marital";"education";"default";"housing";"loan";"contact";"month";"day_of_week";"duration";"campaign";"pdays";"previous";"poutcome";"emp.var.rate";"cons.price.idx";"cons.conf.idx";"euribor3m";"nr.employed";"y"
行
41;"blue-collar";"divorcededâ€;â€basic.9y";"no";"yes";"no";"cellular";"may";"thu";102;1;999;0;"nonexistent";-1.8;92.893;-46.2;1.327;5099.1;"no"
熊猫正在读它
age 41
job blue-collar
marital divorceded”;”basic.9y
education no
default yes
housing no
loan cellular
contact may
month thu
day_of_week 102
duration 1
campaign 999
pdays 0
previous nonexistent
poutcome -1.8
emp.var.rate 92.893
cons.price.idx -46.2
cons.conf.idx 1.327
euribor3m 5099.1
nr.employed no
y NaN
代码
df = pd.read_csv('Bank.csv',
sep=';',
skiprows = 1,
names=["age", "job", "marital", "education", "default", "housing", "loan", "contact", "month", "day_of_week", "duration", "campaign", "pdays", "previous", "poutcome", "emp.var.rate", "cons.price.idx", "cons.conf.idx", "euribor3m", "nr.employed", "y"],
encoding='utf-8-sig'
)
它周围有吗?
答案 0 :(得分:0)
您的问题是由使用错误的编码定义引起的。如果无法访问原始文件,就无法知道什么是正确的。
我首先完全移除encoding='utf-8-sig
,Pandas应该应对。如果这不起作用,那么utf_16
将成为我的下一次尝试。
查看https://docs.python.org/3/library/codecs.html#standard-encodings页面,了解有关Pandas支持的编码的详细信息。