python的新手,似乎无法理解如何继续。 使用bin并编辑我的数据框后,我能够想出这个:
Continents % Renewable Country
0 Asia (15.753, 29.227] China
1 North America (2.212, 15.753] United States
2 Asia (2.212, 15.753] Japan
3 Europe (2.212, 15.753] United Kingdom
4 Europe (15.753, 29.227] Russian Federation
5 North America (56.174, 69.648] Canada
6 Europe (15.753, 29.227] Germany
7 Asia (2.212, 15.753] India
8 Europe (15.753, 29.227] France
9 Asia (2.212, 15.753] South Korea
10 Europe (29.227, 42.701] Italy
11 Europe (29.227, 42.701] Spain
12 Asia (2.212, 15.753] Iran
13 Australia (2.212, 15.753] Australia
14 South America (56.174, 69.648] Brazil
现在,当我使用以下方法将Continents和%Renewable设置为miltiindex时,
Top15 = Top15.groupby(by=['Continents', '% Renewable']).sum()
获取以下内容:
Country
Continents % Renewable
Asia (15.753, 29.227] China
(2.212, 15.753] JapanIndiaSouth KoreaIran
Australia (2.212, 15.753] Australia
Europe (15.753, 29.227] Russian FederationGermanyFrance
(2.212, 15.753] United Kingdom
(29.227, 42.701] ItalySpain
North America (2.212, 15.753] United States
(56.174, 69.648] Canada
South America (56.174, 69.648] Brazil
现在我希望有一个专栏可以给出每个索引中的国家数量,即:
在第一排 - 中国= 1,
并且在第二排日本印度南韩伊朗将是4岁
所以最后我想要这样的东西:
Asia (2.212, 15.753] 4
(15.753, 29.227] 1
我只是不知道如何到达那里。
此外,数字需要按降序排序,同时仍保持索引分组。
答案 0 :(得分:2)
Top15.groupby(['Continents', '% Renewable']).Country.count()
Continents % Renewable
Asia (15.753, 29.227] 1
(2.212, 15.753] 4
Australia (2.212, 15.753] 1
Europe (15.753, 29.227] 3
(2.212, 15.753] 1
(29.227, 42.701] 2
North America (2.212, 15.753] 1
(56.174, 69.648] 1
South America (56.174, 69.648] 1
Name: Country, dtype: int64
按照您想要的顺序排序
Top15_count = Top15.groupby(['Continents', '% Renewable']).Country.count()
Top15_count.reset_index() \
.sort_values(
['Continents', 'Country'],
ascending=[True, False]
).set_index(['Continents', '% Renewable']).Country
Continents % Renewable
Asia (2.212, 15.753] 4
(15.753, 29.227] 1
Australia (2.212, 15.753] 1
Europe (15.753, 29.227] 3
(29.227, 42.701] 2
(2.212, 15.753] 1
North America (2.212, 15.753] 1
(56.174, 69.648] 1
South America (56.174, 69.648] 1
Name: Country, dtype: int64
答案 1 :(得分:2)
size
的解决方案:
What is the difference between size and count in pandas?
print (Top15.groupby(['Continents', '% Renewable']).size())
Name: Country, dtype: int64
Continents % Renewable
Asia (15.753, 29.227] 1
(2.212, 15.753] 4
Australia (2.212, 15.753] 1
Europe (15.753, 29.227] 3
(2.212, 15.753] 1
(29.227, 42.701] 2
North America (2.212, 15.753] 1
(56.174, 69.648] 1
South America (56.174, 69.648] 1
dtype: int64
如果需要更改订单,请使用sort_values
,如果需要添加reset_index
,请使用MultiIndex
添加set_index
。
print (Top15.groupby(['Continents', '% Renewable']) \
.size() \
.reset_index(name='COUNT') \
.sort_values(['Continents', 'COUNT'], ascending=[True, False]) \
.set_index(['Continents','% Renewable']).COUNT)
Continents % Renewable
Asia (2.212, 15.753] 4
(15.753, 29.227] 1
Australia (2.212, 15.753] 1
Europe (15.753, 29.227] 3
(29.227, 42.701] 2
(2.212, 15.753] 1
North America (2.212, 15.753] 1
(56.174, 69.648] 1
South America (56.174, 69.648] 1
Name: COUNT, dtype: int64