我正在尝试按熊猫数据框中的一列分组!
代码:
import pandas as pd
stats_reader = pd.read_csv('C:/Users/Name/PycharmProjects/Corona Stats/TimeSeries/03-20-2020.csv')
stats_clean = stats_reader.drop(['Province/State', 'Last Update', 'Latitude', 'Longitude'], axis=1)
stats_clean.reset_index(drop=True, inplace=True)
stats_clean.groupby(['Country/Region'])
stats_clean.to_csv('Clean Corona Stats.csv')
结果:
,Country/Region,Confirmed,Deaths,Recovered
0,China,67800,3133,58382
1,Italy,47021,4032,4440
2,Spain,20410,1043,1588
3,Germany,19848,67,180
4,Iran,19644,1433,6745
5,France,12612,450,12
6,"Korea, South",8652,94,1540
7,US,8310,42,0
8,Switzerland,5294,54,15
9,United Kingdom,3983,177,65
10,Netherlands,2994,106,2
11,Austria,2388,6,9
12,Belgium,2257,37,1
13,Norway,1914,7,1
14,Sweden,1639,16,16
15,US,1524,83,0
...
理想的结果是按照国家(地区)对列进行分组。我假设它只会将相同值的所有行放在一起,但是该数据帧与此代码保持不变。
我尝试过:
stats_clean.groupby(['Country/Region'])['Confirmed'].sum()
在原始数据框中也不会产生任何变化。我在这里想念什么?我觉得这至少应该做点什么,但是除了删除列之外,无论做什么都没有改变。我在jupyter中运行了所有程序,只是为了确保pycharm未被破坏,但我得到了相同的结果。
答案 0 :(得分:0)
我不知道您的问题是什么,我的确切副本(对您的示例进行了少许修改以供阅读)完全符合groupby()
的意图。
用于复制/粘贴的示例(我在这里所做的唯一一件事情是删除“韩国南部”中的引号和逗号):
,Country/Region,Confirmed,Deaths,Recovered
0,China,67800,3133,58382
1,Italy,47021,4032,4440
2,Spain,20410,1043,1588
3,Germany,19848,67,180
4,Iran,19644,1433,6745
5,France,12612,450,12
6,Korea South,8652,94,1540
7,US,8310,42,0
8,Switzerland,5294,54,15
9,United Kingdom,3983,177,65
10,Netherlands,2994,106,2
11,Austria,2388,6,9
12,Belgium,2257,37,1
13,Norway,1914,7,1
14,Sweden,1639,16,16
15,US,1524,83,0
import pandas
# copy above sample
df = pd.read_clipboard(sep=',', index_col=0)
df1 = df.groupby(['Country/Region'])['Confirmed'].sum()
print(df1)
Country/Region
Austria 2388
Belgium 2257
China 67800
France 12612
Germany 19848
Iran 19644
Italy 47021
Korea South 8652
Netherlands 2994
Norway 1914
Spain 20410
Sweden 1639
Switzerland 5294
US 9834
United Kingdom 3983
Name: Confirmed, dtype: int64
由于US
是唯一在此示例中出现两次的项目,因此它的Confirmed
列将与.sum()
聚合,其余组(Country/Region
s)将保持不变。