大熊猫数据帧组总和

时间:2014-05-22 14:24:26

标签: python pandas group-by aggregate-functions dataframe

输入:

       Date letters numbers mixed         new
0  1/2/2014       a       6    z1  1/2/2014 a
1  1/2/2014       a       3    z1  1/2/2014 a
2  1/3/2014       c       1    x3  1/3/2014 c

我想分组new和总和numbers,以便输出为:

       Date letters numbers mixed         new
0  1/2/2014       a       9    z1  1/2/2014 a
1  1/3/2014       c       1    x3  1/3/2014 c

我在这里看过:http://pandas.pydata.org/pandas-docs/stable/groupby.html但没有运气。

这是我的代码:

import pandas
a=[['Date', 'letters', 'numbers', 'mixed'], ['1/2/2014', 'a', '6', 'z1'], ['1/2/2014', 'a', '3', 'z1'], ['1/3/2014', 'c', '1', 'x3']]
df = pandas.DataFrame.from_records(a[1:],columns=a[0])
f=[]
for i in range(0,len(df)):
    f.append(df['Date'][i] + ' ' + df['letters'][i])
df['new']=f

此外,任何在没有循环的情况下连接dateletters的技巧也会有所帮助。

1 个答案:

答案 0 :(得分:1)

您必须将numbers列转换为int

import pandas as pd
a=[['Date', 'letters', 'numbers', 'mixed'], ['1/2/2014', 'a', '6', 'z1'], ['1/2/2014', 'a', '3', 'z1'], ['1/3/2014', 'c', '1', 'x3']]
df = pd.DataFrame.from_records(a[1:],columns=a[0])
df['new'] = df.Date + " " + df.letters
df.numbers = df.numbers.astype(int)

print df

       Date letters  numbers mixed         new
0  1/2/2014       a        6    z1  1/2/2014 a
1  1/2/2014       a        3    z1  1/2/2014 a
2  1/3/2014       c        1    x3  1/3/2014 c

您可以获取要合并的数据框:

df_to_merge = df[df.columns[~df.columns.isin(['numbers'])]].drop_duplicates()

然后你可以做groupby

df_grouped = pd.DataFrame(df.groupby('new').numbers.sum()).reset_index()

要获得您发布的结果merge

df_result = df_to_merge.merge(df_grouped)
print df_result

       Date letters mixed         new  numbers
0  1/2/2014       a    z1  1/2/2014 a        9
1  1/3/2014       c    x3  1/3/2014 c        1