合并两个相同数据类型的列上的数据框,但出现ValueError

时间:2018-09-11 21:11:37

标签: python pandas

我要合并收入df中的profile_ID列和CompProfile df中的索引CommodityClass上的两个数据框。

收入

          profile_ID   type          col1        col2
0         O-COMP-1006  Small_Off   4.1427e+07   4.0027e+07
1         O-COMP-1006  Small_Off   4.7915e+07   4.6515e+07
2         O-COMP-1006  Small_Off  6.10424e+07  5.96424e+07
3         O-COMP-1006  Small_Off  6.83726e+07  6.69726e+07
4         O-COMP-1008  Small_Off  7.28167e+07  7.14167e+07
5         O-COMP-1009  Small_Off   7.6147e+07   7.4747e+07
7         O-COMP-1006  Small_Off  8.02798e+07  7.88798e+07
8         O-COMP-1006  Small_Off  8.17172e+07  8.03172e+07
9         O-COMP-1006  Small_Off  8.42322e+07  8.28322e+07
10        O-COMP-1005  Small_Off  8.54957e+07  8.40747e+07
11        O-COMP-1006  Small_Off  8.67782e+07  8.53358e+07
12        O-COMP-1006  Small_Off  8.80798e+07  8.66159e+07
13        O-COMP-1007  Small_Off   8.9401e+07  8.79151e+07
14        O-COMP-1006  Small_Off   9.0742e+07  8.92338e+07

和CompProfile

                   col1     col2        col3
CommodityClass          
profile_ID         NaN      NaN          NaN
O-COMP-1001        0.0      0.0          0.0
O-COMP-1002        0.0      0.0          0.0
O-COMP-1003        0.0      0.0          0.0
O-COMP-1004        0.0      0.0          0.0
O-COMP-1005        0.0      0.0          0.0
O-COMP-1006        1.0      0.0          0.0
O-COMP-1007        0.0      0.0          1.0
O-COMP-1008        0.0      0.0          0.0
O-COMP-1009        0.0      1.0          0.0

我用

pd.merge( Income, CompProfile, how='left', \
    left_on = 'profile_ID', right_index=True, \
    suffixes = ("_USD","_frac") )

并得到一个错误

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

我同时检查了要合并的列和索引,它们都是object类型的。 我尝试使用join,但是遇到了相同的错误。

Income.join(CompProfile, on= 'profile_ID',lsuffix = "_USD",rsuffix = "_frac")

我还尝试重置CompProfile的索引并合并到列上

CompProfile.reset_index()
pd.merge( Income, CompProfile, how='left', \
    left_on = 'profile_ID', right_on='CommodityClass', \
    suffixes = ("_USD","_frac") )

在这种情况下,我得到了

KeyError: 'CommodityClass'

我还尝试了从CompProfile中删除“ profile_ID”行,但没有任何改变。

CompProfile.head(10).to_dict()
{'col1': {'profile_ID': nan, 'O-COMP-1001': 0.0, 'O-COMP-1002': 0.0, 'O-COMP-1003': 0.0, 'O-COMP-1004': 0.0, 'O-COMP-1005': 0.0, 'O-COMP-1006': 1.0, 'O-COMP-1007': 0.0, 'O-COMP-1008': 0.0, 'O-COMP-1009': 0.0}, 'col2': {'profile_ID': nan, 'O-COMP-1001': 0.0, 'O-COMP-1002': 0.0, 'O-COMP-1003': 0.0, 'O-COMP-1004': 0.0, 'O-COMP-1005': 0.0, 'O-COMP-1006': 0.0, 'O-COMP-1007': 0.0, 'O-COMP-1008': 0.0, 'O-COMP-1009': 1.0}, 'col3': {'profile_ID': nan, 'O-COMP-1001': 0.0, 'O-COMP-1002': 0.0, 'O-COMP-1003': 0.0, 'O-COMP-1004': 0.0, 'O-COMP-1005': 0.0, 'O-COMP-1006': 0.0, 'O-COMP-1007': 1.0, 'O-COMP-1008': 0.0, 'O-COMP-1009': 0.0}}

1 个答案:

答案 0 :(得分:0)

您的第一次尝试效果很好。以下是一个完整的工作示例。您需要退后一步,弄清楚为什么您的数据与下面提供的示例数据完全不同。

import pandas as pd
from numpy import nan

d1 = {'profile_ID': {0: 'O-COMP-1006', 1: 'O-COMP-1006', 2: 'O-COMP-1006', 3: 'O-COMP-1006', 4: 'O-COMP-1008', 5: 'O-COMP-1009', 7: 'O-COMP-1006', 8: 'O-COMP-1006', 9: 'O-COMP-1006', 10: 'O-COMP-1005'}, 'type': {0: 'Small_Off', 1: 'Small_Off', 2: 'Small_Off', 3: 'Small_Off', 4: 'Small_Off', 5: 'Small_Off', 7: 'Small_Off', 8: 'Small_Off', 9: 'Small_Off', 10: 'Small_Off'}, 'col1': {0: 41427000.0, 1: 47915000.0, 2: 61042400.0, 3: 68372600.0, 4: 72816700.0, 5: 76147000.0, 7: 80279800.0, 8: 81717200.0, 9: 84232200.0, 10: 85495700.0}, 'col2': {0: 40027000.0, 1: 46515000.0, 2: 59642400.0, 3: 66972600.0, 4: 71416700.0, 5: 74747000.0, 7: 78879800.0, 8: 80317200.0, 9: 82832200.0, 10: 84074700.0}}
d2 = {'col1': {'profile_ID': nan, 'O-COMP-1001': 0.0, 'O-COMP-1002': 0.0, 'O-COMP-1003': 0.0, 'O-COMP-1004': 0.0, 'O-COMP-1005': 0.0, 'O-COMP-1006': 1.0, 'O-COMP-1007': 0.0, 'O-COMP-1008': 0.0, 'O-COMP-1009': 0.0}, 'col2': {'profile_ID': nan, 'O-COMP-1001': 0.0, 'O-COMP-1002': 0.0, 'O-COMP-1003': 0.0, 'O-COMP-1004': 0.0, 'O-COMP-1005': 0.0, 'O-COMP-1006': 0.0, 'O-COMP-1007': 0.0, 'O-COMP-1008': 0.0, 'O-COMP-1009': 1.0}, 'col3': {'profile_ID': nan, 'O-COMP-1001': 0.0, 'O-COMP-1002': 0.0, 'O-COMP-1003': 0.0, 'O-COMP-1004': 0.0, 'O-COMP-1005': 0.0, 'O-COMP-1006': 0.0, 'O-COMP-1007': 1.0, 'O-COMP-1008': 0.0, 'O-COMP-1009': 0.0}}

Income = pd.DataFrame.from_dict(d1)
CompProfile = pd.DataFrame.from_dict(d2)

res = pd.merge(Income, CompProfile, how='left',
               left_on='profile_ID', right_index=True,
               suffixes=('_USD', '_frac'))

print(res)

     profile_ID       type    col1_USD    col2_USD  col1_frac  col2_frac  col3
0   O-COMP-1006  Small_Off  41427000.0  40027000.0        1.0        0.0   0.0
1   O-COMP-1006  Small_Off  47915000.0  46515000.0        1.0        0.0   0.0
2   O-COMP-1006  Small_Off  61042400.0  59642400.0        1.0        0.0   0.0
3   O-COMP-1006  Small_Off  68372600.0  66972600.0        1.0        0.0   0.0
4   O-COMP-1008  Small_Off  72816700.0  71416700.0        0.0        0.0   0.0
5   O-COMP-1009  Small_Off  76147000.0  74747000.0        0.0        1.0   0.0
7   O-COMP-1006  Small_Off  80279800.0  78879800.0        1.0        0.0   0.0
8   O-COMP-1006  Small_Off  81717200.0  80317200.0        1.0        0.0   0.0
9   O-COMP-1006  Small_Off  84232200.0  82832200.0        1.0        0.0   0.0
10  O-COMP-1005  Small_Off  85495700.0  84074700.0        0.0        0.0   0.0