熊猫:获得group-minima和相应的索引值

时间:2017-03-29 12:47:26

标签: python pandas dataframe

情况

作为一个简单示例,请考虑以下pandas数据帧:

import pandas as pd

headers = ["city", "year", "births", "deaths", "immigrations", "emigrations"]
data = [
    ["Gotham", 2016, 1616, 1020, 1541, 1893],
    ["Gotham", 2015, 1785, 1708, 1604, 1776],
    ["Gotham", 2014, 1279, 1946, 1991, 1169],
    ["Gotham", 2013, 1442, 1932, 1960, 1580],
    ["Metropolis", 2016, 6405, 6393, 5390, 6797],
    ["Metropolis", 2015, 6017, 5492, 5647, 6994],
    ["Metropolis", 2014, 6644, 6893, 6759, 5149],
    ["Metropolis", 2013, 6902, 6160, 5294, 5112],
    ["Smallville", 2016, 43, 10, 29, 48],
    ["Smallville", 2015, 16, 21, 17, 19],
    ["Smallville", 2014, 20, 31, 28, 43],
    ["Smallville", 2013, 46, 11, 25, 25],
]

df = pd.DataFrame(data, columns=headers)
df.set_index(["city", "year"], inplace=True)

在控制台输出中看起来像这样:

                 births  deaths  immigrations  emigrations
city       year
Gotham     2016    1616    1020          1541         1893
           2015    1785    1708          1604         1776
           2014    1279    1946          1991         1169
           2013    1442    1932          1960         1580
Metropolis 2016    6405    6393          5390         6797
           2015    6017    5492          5647         6994
           2014    6644    6893          6759         5149
           2013    6902    6160          5294         5112
Smallville 2016      43      10            29           48
           2015      16      21            17           19
           2014      20      31            28           43
           2013      46      11            25           25

问题

对于每个数据列,我想知道每个城市的最低值,以及它发生的年份。基本上,我正在尝试获得如下所示的结果数据框:

            births       deaths       immigrations       emigrations
               min  year    min  year          min  year         min  year
city
Gotham        1279  2014   1020  2016         1541  2016        1169  2014
Metropolis    6017  2015   5492  2015         5294  2013        5112  2013
Smallville      16  2015     10  2016           17  2015          19  2015

到目前为止尝试

我能够获得每个城市的最低值,如下所示:

df.groupby(level="city").min()

然而在那之后我被困住了。我无法找到一种方法来获得与最小值相对应的年份。这里有没有人有解决这个问题的好主意?

1 个答案:

答案 0 :(得分:5)

In [180]: df.reset_index(level=0).groupby('city').agg(['min','idxmin','max','idxmax'])
Out[180]:
           births                     deaths                     immigrations  \
              min idxmin   max idxmax    min idxmin   max idxmax          min
city
Gotham       1279   2014  1785   2015   1020   2016  1946   2014         1541
Metropolis   6017   2015  6902   2013   5492   2015  6893   2014         5294
Smallville     16   2015    46   2013     10   2016    31   2014           17

                               emigrations
           idxmin   max idxmax         min idxmin   max idxmax
city
Gotham       2016  1991   2014        1169   2014  1893   2016
Metropolis   2013  6759   2014        5112   2013  6994   2015
Smallville   2015    29   2016          19   2015    48   2016