我一直在尝试加入/合并2个数据帧," df& df_QA"一段时间
第一个数据框:
df_QA:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6878 entries, 0 to 6877
Data columns (total 14 columns):
PROPERTY_CODE 6878 non-null object
ACCOUNT_CODE 6878 non-null object
Jan 6878 non-null float64
Feb 6878 non-null float64
Mar 6878 non-null float64
Apr 6878 non-null float64
May 6878 non-null float64
Jun 6878 non-null float64
Jul 6878 non-null float64
Aug 6878 non-null float64
Sep 6878 non-null float64
Oct 6878 non-null float64
Nov 6878 non-null float64
Dec 6878 non-null float64
dtypes: float64(12), object(2)
memory usage: 752.4+ KB
第二个数据框:
DF:
df = pd.read_csv(fname, sep="^",usecols=[2,3,5,6,7,8,9,10,11,12,13,14,15,16],converters={'Account': np.str, 'Entity ID': lambda x: str(x)}).dropna(subset=['Account'],how='any')
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2441 entries, 0 to 2440
Data columns (total 14 columns):
PROPERTY_CODE 2441 non-null object
ACCOUNT_CODE 2441 non-null object
Jan 2441 non-null float64
Feb 2441 non-null float64
Mar 2441 non-null float64
Apr 2441 non-null float64
May 2441 non-null float64
Jun 2441 non-null float64
Jul 2441 non-null float64
Aug 2441 non-null float64
Sep 2441 non-null float64
Oct 2441 non-null float64
Nov 2441 non-null int64
Dec 2441 non-null int64
dtypes: float64(10), int64(2), object(2)
memory usage: 286.1+ KB
我试过了:
df_check = pd.merge(df, df_QA, how='inner', on=['PROPERTY_CODE','ACCOUNT_CODE'])
或
df_check = df.merge(df_QA, left_on=['PROPERTY_CODE', 'ACCOUNT_CODE'], right_on=['PROPERTY_CODE', 'ACCOUNT_CODE'], how='inner',sort='True')
返回:
print (df_check)
Empty DataFrame
Columns: [PROPERTY_CODE, ACCOUNT_CODE, Jan_x, Feb_x, Mar_x, Apr_x, May_x, Jun_x, Jul_x, Aug_x, Sep_x, Oct_x, Nov_x, Dec_x, Jan_y, Feb_y, Mar_y, Apr_y, May_y, Jun_y, Jul_y, Aug_y, Sep_y, Oct_y, Nov_y, Dec_y]
Index: []
我们希望以下列格式获得数据框:
PROPERTY_CODE, ACCOUNT_CODE, Jan_x, Feb_x, Mar_x, Apr_x, May_x, Jun_x, Jul_x, Aug_x, Sep_x, Oct_x, Nov_x, Dec_x, Jan_y, Feb_y, Mar_y, Apr_y, May_y, Jun_y, Jul_y, Aug_y, Sep_y, Oct_y, Nov_y, Dec_y
有什么想法?谢谢!
当我尝试外面时:
df_check = pd.merge(df, df_QA, how='inner', on=['PROPERTY_CODE','ACCOUNT_CODE'])
PROPERTY_CODE ACCOUNT_CODE Jan Feb Mar Apr May Jun Jul Aug Sep \
0 05099 MR01030000 NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 05099 MR01060000 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 05099 MR01060005 NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 05099 MR01200000 NaN NaN NaN NaN NaN NaN NaN NaN
它返回NaN。我检查了PROPERTY_CODE和ACCOUNT_CODE,但它们对我看起来完全一样。
print (df_QA.loc[df_QA['PROPERTY_CODE'] == "05099"])
PROPERTY_CODE ACCOUNT_CODE Jan Feb Mar
604 05099 MR01030000 -1000 -10000.75 -10000.09
605 05099 MR01060000 100000.05 100.35 -1003128.17
print (df.loc[df['PROPERTY_CODE'] == "05099"])
PROPERTY_CODE ACCOUNT_CODE Jan Feb Mar
0 05099 MR01030000 -1.000000e+09 -100000.75 -100000.09
1 05099 MR01060000 1.000000e+05 1100.35 -1000000.17