如何在Pandas中两个相关数据框的两列之间进行比较

时间:2019-01-27 18:40:32

标签: python pandas

我有一个名为schemas_df的DataFrame,具有以下架构:

"County Name"   "State"   "One-Unit Limit"

这看起来像:

data1 = {'County Name': ["A", "B", "C", "D"], 'State': ['AA', 'AB', 'AA', 'AC'], 'One-Unit Limit': [100, 200, 150, 300]}
limits_df = pd.DataFrame.from_dict(data1)

我还有一个名为schema_f的DataFrame:

county  state   price   

这看起来像:

data2 = {'county': ["B", "C", "A", "E"], 'state': ['AB', 'AC', 'AA', 'AF'], 'price': [300, 200, 150, 300]}
loans_df = pd.DataFrame.from_dict(data2)

我想在loan_df [“ jumbo”]中创建一个新列,当贷款价格大于其相应县的限制时为True。在以下代码中:

county_limit = limits_df.loc[ (limits_df["County Name"] == str(loans_df["county"])) & (limits_df["State"] == str(loans_df["state"])) ]["One-Unit Limit"].item()
loan_price = loans_df["price"].item()
if(loan_price > county_limit):
   loans_df["jumbo"] = True
else:
   loans_df["jumbo"] = False

iterrows中执行此操作会花费很长时间,因为我需要创建loan_df [“ jumbo”],然后更改应为不可变数据的内容。使用apply()map()难道没有更简单的方法吗?

3 个答案:

答案 0 :(得分:1)

IIUC,您可以使用

df2 = loans_df.merge(limits_df[['State', 'County Name', 'One-Unit Limit']], how='left',
                     left_on=['state', 'county'], right_on=['State', 'County Name'])
df2['jumbo'] = df2['price'] > df2['One-Unit Limit']

在将pd.merge与左连接一起使用的情况下,按州和县对每笔贷款的限额进行匹配。然后,您可以立即进行布尔比较,以检查jumboTrue还是False

请注意,当没有找到州/县的限制时,它将以巨型形式输出False

答案 1 :(得分:1)

这假定limits_df中的所有县和州都在loans_df中找到

loans_df['jumbo'] = pd.merge(limits_df, loans_df, 
                             left_on=['County Name', 'State'],
                             right_on=['county', 'state'], how='left') \
                        .apply(lambda x: x['price'] > x['One-Unit Limit'], axis=1)

答案 2 :(得分:1)

m=limits_df.merge(loans_df,left_on=['County Name','State'],right_on=['county','state'])
loans_df["jumbo"]=loans_df['county'].isin(m.loc[m['price']>m['One-Unit Limit'],'County Name'])
print(loans_df)

  county state  price  jumbo
0      B    AB    300   True
1      C    AC    200  False
2      A    AA    150   True
3      E    AF    300  False