将熊猫数据框分组后索引消失

时间:2020-05-15 17:26:38

标签: python pandas dataframe indexing group-by

我有以下熊猫系列

Reducedset['% Renewable']

哪个给我:

Asia           China                 19.7549
               Japan                 10.2328
               India                 14.9691
               South Korea           2.27935
               Iran                  5.70772
North America  United States          11.571
               Canada                61.9454
Europe         United Kingdom        10.6005
               Russian Federation    17.2887
               Germany               17.9015
               France                17.0203
               Italy                 33.6672
               Spain                 37.9686
Australia      Australia             11.8108
South America  Brazil                 69.648
Name: % Renewable, dtype: object

然后我将该系列分类为5个容器:

binning = pd.cut(Top15['% Renewable'],5)

哪个给我:

Asia           China                 (15.753, 29.227]
               Japan                  (2.212, 15.753]
               India                  (2.212, 15.753]
               South Korea            (2.212, 15.753]
               Iran                   (2.212, 15.753]
North America  United States          (2.212, 15.753]
               Canada                (56.174, 69.648]
Europe         United Kingdom         (2.212, 15.753]
               Russian Federation    (15.753, 29.227]
               Germany               (15.753, 29.227]
               France                (15.753, 29.227]
               Italy                 (29.227, 42.701]
               Spain                 (29.227, 42.701]
Australia      Australia              (2.212, 15.753]
South America  Brazil                (56.174, 69.648]
Name: % Renewable, dtype: category
Categories (5, interval[float64]): [(2.212, 15.753] < (15.753, 29.227] < (29.227, 42.701] <
                                    (42.701, 56.174] < (56.174, 69.648]]

然后我将这些分类的数据进行分组,以便计算每个分类中的国家/地区数量:

 Reduced = Reducedset.groupby(binning)['% Renewable'].agg(['count'])

哪个给我:

% Renewable
(2.212, 15.753]     7
(15.753, 29.227]    4
(29.227, 42.701]    2
(42.701, 56.174]    0
(56.174, 69.648]    2
Name: count, dtype: int64

但是,索引已消失,我仍然希望保留“大洲”的索引(外部索引)。

因此,在[%Renewable]列的最左侧,应该说:

Asia
North America 
Europe
Australia
South America 

当我尝试通过以下方式这样做时:

print(Reducedset['% Renewable'].groupby([Reducedset['% Renewable'].index.get_level_values(0),pd.cut(Reducedset['% Renewable'],5)]).count())

它有效!

问题解决了!

1 个答案:

答案 0 :(得分:1)

我们假设以下数据:

np.random.seed(1)
s = pd.Series(np.random.randint(0,10, 16), 
              index=pd.MultiIndex.from_arrays([list('aaaabbccdddddeee'), 
                                               list('abcdefghijklmnop')]))

那么,您正在寻找的IIUC是什么

print(s.groupby([s.index.get_level_values(0), #that is the continent for you
                 pd.cut(s, 5)]) #that is the binning you created
       .count())
a  (-0.009, 1.8]    0
   (1.8, 3.6]       0
   (3.6, 5.4]       2
   (5.4, 7.2]       0
   (7.2, 9.0]       2
b  (-0.009, 1.8]    2
   (1.8, 3.6]       0
   (3.6, 5.4]       0
   (5.4, 7.2]       0
   (7.2, 9.0]       0
c  (-0.009, 1.8]    1
   (1.8, 3.6]       0
   (3.6, 5.4]       0
   (5.4, 7.2]       1
   (7.2, 9.0]       0
d  (-0.009, 1.8]    0
   (1.8, 3.6]       1
   (3.6, 5.4]       2
   (5.4, 7.2]       1
   (7.2, 9.0]       1
e  (-0.009, 1.8]    0
   (1.8, 3.6]       2
   (3.6, 5.4]       1
   (5.4, 7.2]       0
   (7.2, 9.0]       0
dtype: int64