作为一个简单示例,请考虑以下pandas数据帧:
import pandas as pd
headers = ["city", "year", "births", "deaths", "immigrations", "emigrations"]
data = [
["Gotham", 2016, 1616, 1020, 1541, 1893],
["Gotham", 2015, 1785, 1708, 1604, 1776],
["Gotham", 2014, 1279, 1946, 1991, 1169],
["Gotham", 2013, 1442, 1932, 1960, 1580],
["Metropolis", 2016, 6405, 6393, 5390, 6797],
["Metropolis", 2015, 6017, 5492, 5647, 6994],
["Metropolis", 2014, 6644, 6893, 6759, 5149],
["Metropolis", 2013, 6902, 6160, 5294, 5112],
["Smallville", 2016, 43, 10, 29, 48],
["Smallville", 2015, 16, 21, 17, 19],
["Smallville", 2014, 20, 31, 28, 43],
["Smallville", 2013, 46, 11, 25, 25],
]
df = pd.DataFrame(data, columns=headers)
df.set_index(["city", "year"], inplace=True)
在控制台输出中看起来像这样:
births deaths immigrations emigrations
city year
Gotham 2016 1616 1020 1541 1893
2015 1785 1708 1604 1776
2014 1279 1946 1991 1169
2013 1442 1932 1960 1580
Metropolis 2016 6405 6393 5390 6797
2015 6017 5492 5647 6994
2014 6644 6893 6759 5149
2013 6902 6160 5294 5112
Smallville 2016 43 10 29 48
2015 16 21 17 19
2014 20 31 28 43
2013 46 11 25 25
对于每个数据列,我想知道每个城市的最低值,以及它发生的年份。基本上,我正在尝试获得如下所示的结果数据框:
births deaths immigrations emigrations
min year min year min year min year
city
Gotham 1279 2014 1020 2016 1541 2016 1169 2014
Metropolis 6017 2015 5492 2015 5294 2013 5112 2013
Smallville 16 2015 10 2016 17 2015 19 2015
我能够获得每个城市的最低值,如下所示:
df.groupby(level="city").min()
然而在那之后我被困住了。我无法找到一种方法来获得与最小值相对应的年份。这里有没有人有解决这个问题的好主意?
答案 0 :(得分:5)
In [180]: df.reset_index(level=0).groupby('city').agg(['min','idxmin','max','idxmax'])
Out[180]:
births deaths immigrations \
min idxmin max idxmax min idxmin max idxmax min
city
Gotham 1279 2014 1785 2015 1020 2016 1946 2014 1541
Metropolis 6017 2015 6902 2013 5492 2015 6893 2014 5294
Smallville 16 2015 46 2013 10 2016 31 2014 17
emigrations
idxmin max idxmax min idxmin max idxmax
city
Gotham 2016 1991 2014 1169 2014 1893 2016
Metropolis 2013 6759 2014 5112 2013 6994 2015
Smallville 2015 29 2016 19 2015 48 2016