sum和groupby对我使用pandas不起作用

时间:2017-09-04 10:23:58

标签: python pandas csv

我有以下数据集:

enter image description here

我想使用pandas对cantidad(名称)的nombre(数量)列进行分组,所以我尝试过: enter image description here

由于“Ana”是一个通用名称,第一排很惊讶所以我检查了: enter image description here

好的,所以......“Ana”的总和是434而不是1.发生了什么事?我做错了什么?

1 个答案:

答案 0 :(得分:2)

您需要skipinitialspace=True,因为列nombre中的值具有尾随空格 - 因此'Ana'' Ana'' Ana ' ...分别进行分组:

historical_names = pd.read_csv('nombres-1920-1924.csv', skipinitialspace =True)
print (historical_names.head())

resume = historical_names.groupby('nombre')['cantidad'].sum()
print (resume['Ana'])
437

a = historical_names.loc[historical_names['nombre'] == 'Ana', 'cantidad']
print (a)
5        113
10340    138
18776      1
23114    183
26523      2
Name: cantidad, dtype: int64

a = historical_names.loc[historical_names['nombre'] == 'Ana', 'cantidad'].sum()
print (a)
437
historical_names = pd.read_csv('nombres-1920-1924.csv')
print (historical_names.head())

historical_names['nombre'] = historical_names['nombre'].str.strip()
resume = historical_names.groupby('nombre')['cantidad'].sum()
print (resume['Ana'])
437
historical_names = pd.read_csv('nombres-1920-1924.csv')
print (historical_names.head())

resume = historical_names.groupby('nombre')['cantidad'].sum()
print (resume['Ana'])
434

a = historical_names.loc[historical_names['nombre'] == 'Ana', 'cantidad']
print (a)
5        113
10340    138
23114    183
Name: cantidad, dtype: int64

a = historical_names.loc[historical_names['nombre'] == 'Ana', 'cantidad'].sum()
print (a)
434