Python - Pandas:如何除以特定键的值

时间:2017-02-02 17:44:12

标签: python pandas

我想用另一行pandas dataframe计算列。

例如,当我有这些数据帧时,

let currencyString = "$1,000.00"
let amount = currencyString.removeFormatAmount() // 1000.0

我想在同一年和房间里将租金与其他城市的租金相比较。

理想的结果如下,

df = pd.DataFrame({
    "year" : ['2017', '2017', '2017', '2017', '2017','2017', '2017', '2017', '2017'],
    "rooms" : ['1', '2', '3', '1', '2', '3', '1', '2', '3'],
    "city" : ['tokyo', 'tokyo', 'toyko', 'nyc','nyc', 'nyc', 'paris', 'paris', 'paris'],
    "rent" : [1000, 1500, 2000, 1200, 1600, 1900, 900, 1500, 2200],
})

print(df)

    city  rent rooms  year
0  tokyo  1000     1  2017
1  tokyo  1500     2  2017
2  toyko  2000     3  2017
3    nyc  1200     1  2017
4    nyc  1600     2  2017
5    nyc  1900     3  2017
6  paris   900     1  2017
7  paris  1500     2  2017
8  paris  2200     3  2017

如何根据年份和房间添加 city rent rooms year vs_nyc 0 tokyo 1000 1 2017 0.833333 1 tokyo 1500 2 2017 0.9375 2 toyko 2000 3 2017 1.052631 3 nyc 1200 1 2017 1.0 4 nyc 1600 2 2017 1.0 5 nyc 1900 3 2017 1.0 6 paris 900 1 2017 0.75 7 paris 1500 2 2017 0.9375 8 paris 2200 3 2017 1.157894 之类的列?

我尝试了一些但没有奏效,

vs_nyc

1 个答案:

答案 0 :(得分:2)

举例说明:

set_index + unstack

d1 = df.set_index(['city', 'year', 'rooms']).rent.unstack('city')

d1

city           nyc   paris   tokyo   toyko
year rooms                                
2017 1      1200.0   900.0  1000.0     NaN
     2      1600.0  1500.0  1500.0     NaN
     3      1900.0  2200.0     NaN  2000.0

然后我们可以划分

d1.div(d1.nyc, 0)

city        nyc     paris     tokyo     toyko
year rooms                                   
2017 1      1.0  0.750000  0.833333       NaN
     2      1.0  0.937500  0.937500       NaN
     3      1.0  1.157895       NaN  1.052632

解决方案

d1 = df.set_index(['city', 'year', 'rooms']).rent.unstack('city')
df.join(d1.div(d1.nyc, 0).stack().rename('vs_nyc'), on=['year', 'rooms', 'city'])

    city  rent rooms  year    vs_nyc
0  tokyo  1000     1  2017  0.833333
1  tokyo  1500     2  2017  0.937500
2  toyko  2000     3  2017  1.052632
3    nyc  1200     1  2017  1.000000
4    nyc  1600     2  2017  1.000000
5    nyc  1900     3  2017  1.000000
6  paris   900     1  2017  0.750000
7  paris  1500     2  2017  0.937500
8  paris  2200     3  2017  1.157895

稍微清理一下

cols = ['city', 'year', 'rooms']
ny_rent = df.set_index(cols).rent.loc['nyc'].rename('ny_rent')
df.assign(vs_nyc=df.rent / df.join(d1, on=d1.index.names).ny_rent)