将熊猫系列分组到垃圾箱

时间:2020-05-15 16:55:23

标签: python pandas dataframe group-by bin

我有以下 Pandas系列:

Asia           China                 19.7549
               Japan                 10.2328
               India                 14.9691
               South Korea           2.27935
               Iran                  5.70772
North America  United States          11.571
               Canada                61.9454
Europe         United Kingdom        10.6005
               Russian Federation    17.2887
               Germany               17.9015
               France                17.0203
               Italy                 33.6672
               Spain                 37.9686
Australia      Australia             11.8108
South America  Brazil                 69.648
Name: % Renewable, dtype: object

我已将该数据绑定到5个存储箱中:

binning = pd.cut(Reducedset['% Renewable'],5)

然后我想在以下每个 bins 中的每个国家中计算国家/地区数量

df.groupby(binning)['% Renewable'].agg(['count'])

因此,最终数据框应仅以“大陆” 作为索引,而不是国家/地区。

但是,该公式不起作用。

我当前的输出是这样:

                     count
binning                
(2.212, 15.753]       7
(15.753, 29.227]      4
(29.227, 42.701]      2
(56.174, 69.648]      2

我想在这里显示“大陆”的索引...

有人能帮我吗?

2 个答案:

答案 0 :(得分:2)

请确保您不会犯愚蠢的错误,例如为数据框使用错误的名称:

Reducedset.groupby(binning)['% Renewable'].agg(['count'])

答案 1 :(得分:1)

据我了解,您有:

  • 名为 Resetedset DataFrame (不是 Series
  • 带有一个名为%Renewable
  • 的列,
  • 具有二级MultiIndex(大陆国家/地区)。

因为稍后将需要对单个行进行分箱,即使在某些情况下 更改索引,最好将 binning 另存为另一列:

Reducedset['binning'] = pd.cut(Reducedset['% Renewable'], 5)

结果是:

                                  % Renewable           binning
continents    countries                                        
Asia          China                  19.75490  (15.753, 29.227]
              Japan                  10.23280   (2.212, 15.753]
              India                  14.96910   (2.212, 15.753]
              South Korea             2.27935   (2.212, 15.753]
              Iran                    5.70772   (2.212, 15.753]
North America United States          11.57100   (2.212, 15.753]
              Canada                 61.94540  (56.174, 69.648]
Europe        United Kingdom         10.60050   (2.212, 15.753]
              Russian Federation     17.28870  (15.753, 29.227]
              Germany                17.90150  (15.753, 29.227]
              France                 17.02030  (15.753, 29.227]
              Italy                  33.66720  (29.227, 42.701]
              Spain                  37.96860  (29.227, 42.701]
Australia     Australia              11.81080   (2.212, 15.753]
South America Brazil                 69.64800  (56.174, 69.648]

如果您只希望索引中有大陆,则可以运行:

Reducedset.reset_index('countries', inplace=True)

您可以打印它,并按 binning 排序,结果是:

                        countries  % Renewable           binning
continents                                                      
Asia                        Japan     10.23280   (2.212, 15.753]
Asia                        India     14.96910   (2.212, 15.753]
Asia                  South Korea      2.27935   (2.212, 15.753]
Asia                         Iran      5.70772   (2.212, 15.753]
North America       United States     11.57100   (2.212, 15.753]
Europe             United Kingdom     10.60050   (2.212, 15.753]
Australia               Australia     11.81080   (2.212, 15.753]
Asia                        China     19.75490  (15.753, 29.227]
Europe         Russian Federation     17.28870  (15.753, 29.227]
Europe                    Germany     17.90150  (15.753, 29.227]
Europe                     France     17.02030  (15.753, 29.227]
Europe                      Italy     33.66720  (29.227, 42.701]
Europe                      Spain     37.96860  (29.227, 42.701]
North America              Canada     61.94540  (56.174, 69.648]
South America              Brazil     69.64800  (56.174, 69.648]

如您所见,在(2.212,15.753] bin中,您有来自 4 个大洲,因此仍需要有关国家的信息 (尽管您可以将其作为“常规”列)。

现在,您也可以执行聚合,但需要稍作更改:

Reducedset.groupby('binning')['% Renewable'].agg(['count'])

(请注意 Reducedset 而不是 df 以及 binning 周围的撇号, 因为它现在已成为DataFrame中的