问题1：

Question

我有一个如下所示的数据框：

                    Population2010
State     County

AL        Baldwin     90332
          Douglas     92082
          Rolling     52000
CA        Orange      3879602
          San Diego   4364594
          Los Angeles 12123562
CO        Boulder     161818
          Denver      737728
          Jefferson   222368
AZ        Maricopa    2239378
          Pinal       448888
          Pima        1000564

我想根据人口按降序排列数据，但也要按州订购数据

                    Population2010
State     County

AL        Douglas     92082
          Baldwin     90332
          Rolling     52000
CA        Los Angeles 12123562
          San Diego   4364594
          Orange      3879602
CO        Denver      737728
          Jefferson   222368
          Boulder     161818              
AZ        Maricopa    2239378
          Pima        1000564
          Pinal       448888

然后我想总结人口数据的前两个条目，并给出两个具有最高总和的州。

'CA', 'AZ'

Answer 1

问题1：

df.sort_values(['Population2010'], ascending=False)\
  .reindex(sorted(df.index.get_level_values(0).unique()), level=0)

或

df.sort_values('Population2010', ascending=False)\
  .sort_index(level=0, ascending=[True])

输出：

                  Population2010
State County                     
AL    Douglas               92082
      Baldwin               90332
      Rolling               52000
AZ    Maricopa            2239378
      Pima                1000564
      Pinal                448888
CA    Los Angeles        12123562
      San Diego           4364594
      Orange              3879602
CO    Denver               737728
      Jefferson            222368
      Boulder              161818

首先，按降序值对整个数据帧进行排序，然后从level = 0的索引中获取值，对它们进行排序并使用在level = 0上重新索引以对0级组中的数据帧进行排序。

问题2与第一个问题有点无关的计算：

df.groupby('State')['Population2010']\
  .apply(lambda x: x.nlargest(2).sum())\
  .nlargest(2).index.tolist()

输出：

['CA', 'AZ']

使用nlargest查找按州和总和分组的两个最大值，然后再次使用nlargest查找这些总和的两个最大状态。

在数据框中订购子索引并对顶部的“n”条目求和

1 个答案:

问题1：

问题2与第一个问题有点无关的计算：