我有一个看起来像这样的csv(还有更多年):
year,title_field,value
2009,Total Housing Units,39499
2009,Vacant Housing Units,3583
2009,Occupied Housing Units,35916
2008,Total Housing Units,41194
2008,Vacant Housing Units,4483
2008,Occupied Housing Units,36711
2009,Owner Occupied,18057
2009,Renter Occupied,17859
2008,Owner Occupied,17340
2008,Renter Occupied,19371
2009,Median Gross Rent,769
2008,Median Gross Rent,768
我需要找到所有空置住房单元的最大值。
到目前为止,我已经得到了这个:
将pandas导入为pd
df = pd.read_csv("denton_housing.csv", names=("year", "title_field", "value"))
inds = df.groupby(['title_field'])['value'].transform(max) == df['value']
df = df[inds]
df.reset_index(drop=True, inplace=True)
print(df)
那段代码给了我这个:
year title_field value
0 year title_field value
1 2014 Total Housing Units 49109
2 2014 Occupied Housing Units 46295
3 2008 Vacant Housing Units 4483
4 2014 Owner Occupied 21427
5 2014 Renter Occupied 24868
6 2014 Median Gross Rent 905
我只需要它输出:
2008 Vacant Housing Units 4483
答案 0 :(得分:1)
我认为你需要idxmax
df.loc[[df.groupby(['title_field'])['value'].idxmax().loc['Vacant Housing Units']]]
Out[92]:
year title_field value
4 2008 Vacant Housing Units 4483
答案 1 :(得分:0)
您可以先过滤空置住房单位记录,对其进行排序并采取最大值
df.loc[df.title_field.eq('Vacant Housing Units')].sort_values(by='value').tail(1)
Out[96]:
year title_field value
4 2008 Vacant Housing Units 4483