根据来自不同数据帧的两列条件乘以列?

时间:2019-10-04 16:47:38

标签: python python-3.x pandas numpy

我有两个数据框,如下所示:

dfA = 
Country      City           Pop
US           Washington     1000
US           Texas          5000
CH           Geneva         500
CH           Zurich         500


dfB = 
Country      City           Density (pop/km2)
US           Washington     10
US           Texas          50
CH           Geneva         5
CH           Zurich         5

我想要的是比较两个数据帧中的列CountryCity,并在它们匹配时进行比较,例如:

US Washington & US Washington在两个数据帧中,它取Pop值并将其除以Density,以在area中获得一个新列dfB,其中导致分裂。 第一行结果示例 dfB['area km2'] = 100

我尝试过使用np.where(),但是它可以正常工作。关于如何实现这一目标的任何提示?

3 个答案:

答案 0 :(得分:2)

使用索引匹配和div

match_on = ['Country', 'City']
dfA = dfA.set_index(match_on)
dfA.assign(ratio=dfA.Pop.div(df.set_index(['Country', 'City'])['Density (pop/km2)']))

Country  City      
US       Washington    100.0
         Texas         100.0
CH       Geneva        100.0
         Zurich        100.0
dtype: float64

答案 1 :(得分:0)

您还可以像下面那样使用合并

dfB["Area"] = dfB.merge(dfA, on=["Country", "City"], how="left")["Pop"] / dfB["Density (pop/km2)"]
dfB

答案 2 :(得分:0)

您还可以使用merge合并两个数据帧并像往常一样进行划分:

dfMerge = dfA.merge(dfB, on=['Country', 'City'])
dfMerge['area'] = dfMerge['Pop'].div(dfMerge['Density (pop/km2)'])
print(dfMerge)

输出:

  Country        City   Pop  Density (pop/km2)   area
0      US  Washington  1000                 10  100.0
1      US       Texas  5000                 50  100.0
2      CH      Geneva   500                  5  100.0
3      CH      Zurich   500                  5  100.0