在LEFT或RIGHT上将数据框与熊猫合并

时间:2018-08-01 15:29:38

标签: sql python-3.x pandas dataframe merge

entity_data = {'STATE_CD_E': ['NY', 'NY', 'NY'],
           'INTERMEDIATE_NUMBER_E': ['1111', '2222', '3333'],
           'DISTRICT_NUMBER_E': ['123456789012', '123412341234', 
           '121212121212'],
           'FINANCE_NUMBER_E': ['123456', '123412', '121212']  }

df_entity = pd.DataFrame(entity_data, index = ['School_1', 'School_2', 
            'School_3'])

finance_data = {'STATE_CD_F': ['NY', 'NY', 'NY'],
           'INTERMEDIATE_NUMBER_F': ['1111', '2222', '3333'],
           'DISTRICT_NUMBER_F': ['123456', '123412', '121212']  }

df_finance = pd.DataFrame(finance_data, index = ['School_1', 'School_2', 
             'School_3'])

print("\n")
print(df_entity)
print("\n")
print(df_finance)
print("\n")
print("\n")
print("\n")



df_merge = pd.merge(df_entity, df_finance[['INTERMEDIATE_NUMBER_F', 
           'DISTRICT_NUMBER_F']], right_on = ['DISTRICT_NUMBER_F'], left_on= 
           ['FINANCE_NUMBER_E'], how='left')

以上是我正在使用的代码。我正在尝试使用熊猫合并两个数据框。但是,我想在DISTRICT_NUMBER_E的左6位加入DISTRICT_NUMBER_F。反正有这样做吗?如果没有,我可以在entity_data数据框中创建一个新列,该列采用DISTRICT_NUMBER_E的左6位数字,然后在该列上进行匹配吗?

2 个答案:

答案 0 :(得分:3)

您可以这样合并前六位数字:

df_entity.merge(df_finance, left_on=df_entity.DISTRICT_NUMBER_E.str[:6],
                right_on='DISTRICT_NUMBER_F')

  DISTRICT_NUMBER_E FINANCE_NUMBER_E INTERMEDIATE_NUMBER_E STATE_CD_E  \
0      123456789012           123456                  1111         NY   
1      123412341234           123412                  2222         NY   
2      121212121212           121212                  3333         NY   

  DISTRICT_NUMBER_F INTERMEDIATE_NUMBER_F STATE_CD_F  
0            123456                  1111         NY  
1            123412                  2222         NY  
2            121212                  3333         NY  

答案 1 :(得分:0)

# create a key which satisfy the condition for joining the dataframes
df_entity['key'] = df_entity['DISTRICT_NUMBER_E'].str[:6]

# join the both dataframe using the new key into one merged dataframe
# optional use caluse how = 'left'/'right'/'outer' for specific join
merged_df = pd.merge(df_entity, df_finance, left_on='key', right_on='DISTRICT_NUMBER_F')

# optional: dorp the key if not needed anymore
merged_df.drop('key', axis=1, inplace=true)