我正在尝试将列名添加到没有标题的数据框中。
数据框
1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00
2,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00
3,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00
尝试添加列名:
col_names=['Id','RI','Na','Mg','Al','Si','K','Ca','Ba','Fe','Glass Type']
uci=pd.read_csv('UCI.csv', delimiter=',',header=None, names=col_names)
但是第一个列名称出现在整个数据框中,其余列名称为NaN
O / P:
Id RI Na Mg Al Si K Ca Ba Fe Glass Type
0 1,1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 2,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
答案 0 :(得分:0)
我仅在最后一列得到NaN
,因为名称列表中有更多值:
import pandas as pd
temp=u"""1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00
2,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00
3,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
col_names=['Id','RI','Na','Mg','Al','Si','K','Ca','Ba','Fe','Glass Type']
df = pd.read_csv(pd.compat.StringIO(temp), names=col_names)
print (df)
Id RI Na Mg Al Si K Ca Ba Fe \
0 1.52101 13.64000 4.49 1.10 71.78 0.06 8.75 0.00 NaN NaN
1 2.00000 1.51761 13.89 3.60 1.36 72.73 0.48 7.83 0.0 NaN
2 3.00000 1.51618 13.53 3.55 1.54 72.99 0.39 7.78 0.0 NaN
Glass Type
0 NaN
1 NaN
2 NaN
但是似乎您的数据不同,尾随"
,所以必须添加参数quoting
:
temp=u'''"1,1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00"
"2,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00"
"3,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00"'''
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
col_names=['Id','RI','Na','Mg','Al','Si','K','Ca','Ba','Fe','Glass Type']
df = pd.read_csv(pd.compat.StringIO(temp), names=col_names, quoting=3)
print (df)
Id RI Na Mg Al Si K Ca Ba Fe Glass Type
0 "1 1.52101 13.64 4.49 1.10 71.78 0.06 8.75 0.00" NaN NaN
1 "2 1.51761 13.89 3.60 1.36 72.73 0.48 7.83 0.00" NaN NaN
2 "3 1.51618 13.53 3.55 1.54 72.99 0.39 7.78 0.00" NaN NaN
#last manually remove traling "
df['Id'] = df['Id'].str.strip('"')
df['Ba'] = df['Ba'].str.strip('"').astype(float)
print (df)
Id RI Na Mg Al Si K Ca Ba Fe Glass Type
0 1 1.52101 13.64 4.49 1.10 71.78 0.06 8.75 0.00 NaN NaN
1 2 1.51761 13.89 3.60 1.36 72.73 0.48 7.83 0.00 NaN NaN
2 3 1.51618 13.53 3.55 1.54 72.99 0.39 7.78 0.00 NaN NaN
验证问题:
col_names=['Id','RI','Na','Mg','Al','Si','K','Ca','Ba','Fe','Glass Type']
print (pd.read_csv(pd.compat.StringIO(temp), names=col_names))
Id RI Na Mg Al Si K Ca \
0 1,1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00 NaN NaN NaN NaN NaN NaN NaN
1 2,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00 NaN NaN NaN NaN NaN NaN NaN
2 3,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00 NaN NaN NaN NaN NaN NaN NaN
Ba Fe Glass Type
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN