pandas dataframe merge返回空数据帧

时间:2017-11-17 04:42:23

标签: pandas merge pandas-join

我一直在尝试加入/合并2个数据帧," df& df_QA"一段时间

第一个数据框:

df_QA:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6878 entries, 0 to 6877
Data columns (total 14 columns):
PROPERTY_CODE    6878 non-null object
ACCOUNT_CODE     6878 non-null object
Jan              6878 non-null float64
Feb              6878 non-null float64
Mar              6878 non-null float64
Apr              6878 non-null float64
May              6878 non-null float64
Jun              6878 non-null float64
Jul              6878 non-null float64
Aug              6878 non-null float64
Sep              6878 non-null float64
Oct              6878 non-null float64
Nov              6878 non-null float64
Dec              6878 non-null float64
dtypes: float64(12), object(2)
memory usage: 752.4+ KB

第二个数据框:

DF:

df = pd.read_csv(fname, sep="^",usecols=[2,3,5,6,7,8,9,10,11,12,13,14,15,16],converters={'Account': np.str, 'Entity ID': lambda x: str(x)}).dropna(subset=['Account'],how='any')

    <class 'pandas.core.frame.DataFrame'>
Int64Index: 2441 entries, 0 to 2440
Data columns (total 14 columns):
PROPERTY_CODE    2441 non-null object
ACCOUNT_CODE     2441 non-null object
Jan              2441 non-null float64
Feb              2441 non-null float64
Mar              2441 non-null float64
Apr              2441 non-null float64
May              2441 non-null float64
Jun              2441 non-null float64
Jul              2441 non-null float64
Aug              2441 non-null float64
Sep              2441 non-null float64
Oct              2441 non-null float64
Nov              2441 non-null int64
Dec              2441 non-null int64
dtypes: float64(10), int64(2), object(2)
memory usage: 286.1+ KB

我试过了:

df_check = pd.merge(df, df_QA, how='inner', on=['PROPERTY_CODE','ACCOUNT_CODE'])

df_check = df.merge(df_QA, left_on=['PROPERTY_CODE', 'ACCOUNT_CODE'], right_on=['PROPERTY_CODE', 'ACCOUNT_CODE'], how='inner',sort='True')

返回:

print (df_check)

Empty DataFrame
Columns: [PROPERTY_CODE, ACCOUNT_CODE, Jan_x, Feb_x, Mar_x, Apr_x, May_x, Jun_x, Jul_x, Aug_x, Sep_x, Oct_x, Nov_x, Dec_x, Jan_y, Feb_y, Mar_y, Apr_y, May_y, Jun_y, Jul_y, Aug_y, Sep_y, Oct_y, Nov_y, Dec_y]
Index: []

我们希望以下列格式获得数据框:

PROPERTY_CODE, ACCOUNT_CODE, Jan_x, Feb_x, Mar_x, Apr_x, May_x, Jun_x, Jul_x, Aug_x, Sep_x, Oct_x, Nov_x, Dec_x, Jan_y, Feb_y, Mar_y, Apr_y, May_y, Jun_y, Jul_y, Aug_y, Sep_y, Oct_y, Nov_y, Dec_y

有什么想法?谢谢!

当我尝试外面时:

df_check = pd.merge(df, df_QA, how='inner', on=['PROPERTY_CODE','ACCOUNT_CODE'])



         PROPERTY_CODE  ACCOUNT_CODE  Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  \
0            05099  MR01030000    NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   
1            05099  MR01060000    NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   
2            05099  MR01060005    NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   
3            05099  MR01200000    NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN 

它返回NaN。我检查了PROPERTY_CODE和ACCOUNT_CODE,但它们对我看起来完全一样。

print (df_QA.loc[df_QA['PROPERTY_CODE'] == "05099"])

    PROPERTY_CODE ACCOUNT_CODE        Jan        Feb         Mar        
604         05099   MR01030000      -1000  -10000.75   -10000.09    
605         05099   MR01060000  100000.05     100.35 -1003128.17     

print (df.loc[df['PROPERTY_CODE'] == "05099"])

    PROPERTY_CODE  ACCOUNT_CODE           Jan        Feb         Mar
0            05099  MR01030000   -1.000000e+09  -100000.75  -100000.09   
1            05099  MR01060000    1.000000e+05     1100.35 -1000000.17 

0 个答案:

没有答案