pandas:合并数据帧的问题

时间:2016-04-22 18:51:29

标签: python pandas dataframe merge

我尝试合并以下两个数据框on=SICcode

df.head(5)

    SICcode     Catcode     Category                            SICname     MultSIC
0   111         A1500   Wheat, corn, soybeans and cash grain    Wheat        X
1   112         A1600   Other commodities (incl rice, peanuts)  Rice         X
2   115         A1500   Wheat, corn, soybeans and cash grain    Corn         X
3   116         A1500   Wheat, corn, soybeans and cash grain    Soybeans     X
4   119         A1500   Wheat, corn, soybeans and cash grain    Cash grains  X

df.columns.tolist()

['\ufeffSICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']  

merged.head()


2012 NAICS Code     2002to2007 NAICS    SICcode
0   111110          111110               116
1   111120          111120               119
2   111130          111130               119
3   111140          111140               111
4   111150          111150               115

 merged.columns.tolist()
['2012 NAICS Code', '2002to2007 NAICS', 'SICcode']

当我尝试将它们与以下代码合并时:

merged=pd.merge(merged,df, how='left', on='SICcode')    

我收到Keyerror: 'SICcode'我试图设置dfs Onedtype但是当我这样做时,我会收到Keycode error

如果有人对此有所了解或要求提供更多信息,请告知我们。

1 个答案:

答案 0 :(得分:2)

注意第一栏:

In [27]: df = pd.read_csv('https://github.com/108michael/ms_thesis/raw/master/df.test', index_col=0)

In [28]: df.columns.tolist()
Out[28]: ['\ufeffSICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']

In [29]: df['SICcode']

...

KeyError: 'SICcode'

In [30]: df['\ufeffSICcode'].head()
Out[30]:
0    111
1    112
2    115
3    116
4    119
Name: SICcode, dtype: int64

正如@unutbu在评论中所说,在encoding='utf-8_sig'电话中添加pd.read_csv()可能有助于您解决此问题:

In [31]: df = pd.read_csv('https://github.com/108michael/ms_thesis/raw/master/df.test', index_col=0, encoding='utf-8_sig')

In [32]: df.columns.tolist()
Out[32]: ['SICcode', 'Catcode', 'Category', 'SICname', 'MultSIC']