我尝试合并以下两个数据框on=SICcode
:
df.head(5)
SICcode Catcode Category SICname MultSIC
0 111 A1500 Wheat, corn, soybeans and cash grain Wheat X
1 112 A1600 Other commodities (incl rice, peanuts) Rice X
2 115 A1500 Wheat, corn, soybeans and cash grain Corn X
3 116 A1500 Wheat, corn, soybeans and cash grain Soybeans X
4 119 A1500 Wheat, corn, soybeans and cash grain Cash grains X
df.columns.tolist()
['\ufeffSICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']
merged.head()
2012 NAICS Code 2002to2007 NAICS SICcode
0 111110 111110 116
1 111120 111120 119
2 111130 111130 119
3 111140 111140 111
4 111150 111150 115
merged.columns.tolist()
['2012 NAICS Code', '2002to2007 NAICS', 'SICcode']
当我尝试将它们与以下代码合并时:
merged=pd.merge(merged,df, how='left', on='SICcode')
我收到Keyerror: 'SICcode'
我试图设置dfs One的dtype
但是当我这样做时,我会收到Keycode error
。
如果有人对此有所了解或要求提供更多信息,请告知我们。
答案 0 :(得分:2)
注意第一栏:
In [27]: df = pd.read_csv('https://github.com/108michael/ms_thesis/raw/master/df.test', index_col=0)
In [28]: df.columns.tolist()
Out[28]: ['\ufeffSICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']
In [29]: df['SICcode']
...
KeyError: 'SICcode'
In [30]: df['\ufeffSICcode'].head()
Out[30]:
0 111
1 112
2 115
3 116
4 119
Name: SICcode, dtype: int64
正如@unutbu在评论中所说,在encoding='utf-8_sig'
电话中添加pd.read_csv()
可能有助于您解决此问题:
In [31]: df = pd.read_csv('https://github.com/108michael/ms_thesis/raw/master/df.test', index_col=0, encoding='utf-8_sig')
In [32]: df.columns.tolist()
Out[32]: ['SICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']