我想用另一行pandas dataframe计算列。
例如,当我有这些数据帧时,
let currencyString = "$1,000.00"
let amount = currencyString.removeFormatAmount() // 1000.0
我想在同一年和房间里将租金与其他城市的租金相比较。
理想的结果如下,
df = pd.DataFrame({
"year" : ['2017', '2017', '2017', '2017', '2017','2017', '2017', '2017', '2017'],
"rooms" : ['1', '2', '3', '1', '2', '3', '1', '2', '3'],
"city" : ['tokyo', 'tokyo', 'toyko', 'nyc','nyc', 'nyc', 'paris', 'paris', 'paris'],
"rent" : [1000, 1500, 2000, 1200, 1600, 1900, 900, 1500, 2200],
})
print(df)
city rent rooms year
0 tokyo 1000 1 2017
1 tokyo 1500 2 2017
2 toyko 2000 3 2017
3 nyc 1200 1 2017
4 nyc 1600 2 2017
5 nyc 1900 3 2017
6 paris 900 1 2017
7 paris 1500 2 2017
8 paris 2200 3 2017
如何根据年份和房间添加 city rent rooms year vs_nyc
0 tokyo 1000 1 2017 0.833333
1 tokyo 1500 2 2017 0.9375
2 toyko 2000 3 2017 1.052631
3 nyc 1200 1 2017 1.0
4 nyc 1600 2 2017 1.0
5 nyc 1900 3 2017 1.0
6 paris 900 1 2017 0.75
7 paris 1500 2 2017 0.9375
8 paris 2200 3 2017 1.157894
之类的列?
我尝试了一些但没有奏效,
vs_nyc
答案 0 :(得分:2)
举例说明:
set_index
+ unstack
d1 = df.set_index(['city', 'year', 'rooms']).rent.unstack('city')
d1
city nyc paris tokyo toyko
year rooms
2017 1 1200.0 900.0 1000.0 NaN
2 1600.0 1500.0 1500.0 NaN
3 1900.0 2200.0 NaN 2000.0
然后我们可以划分
d1.div(d1.nyc, 0)
city nyc paris tokyo toyko
year rooms
2017 1 1.0 0.750000 0.833333 NaN
2 1.0 0.937500 0.937500 NaN
3 1.0 1.157895 NaN 1.052632
解决方案
d1 = df.set_index(['city', 'year', 'rooms']).rent.unstack('city')
df.join(d1.div(d1.nyc, 0).stack().rename('vs_nyc'), on=['year', 'rooms', 'city'])
city rent rooms year vs_nyc
0 tokyo 1000 1 2017 0.833333
1 tokyo 1500 2 2017 0.937500
2 toyko 2000 3 2017 1.052632
3 nyc 1200 1 2017 1.000000
4 nyc 1600 2 2017 1.000000
5 nyc 1900 3 2017 1.000000
6 paris 900 1 2017 0.750000
7 paris 1500 2 2017 0.937500
8 paris 2200 3 2017 1.157895
稍微清理一下
cols = ['city', 'year', 'rooms']
ny_rent = df.set_index(cols).rent.loc['nyc'].rename('ny_rent')
df.assign(vs_nyc=df.rent / df.join(d1, on=d1.index.names).ny_rent)