根据groupby sum()过滤数据帧

时间:2019-09-10 12:46:22

标签: python-3.x pandas-groupby

我想根据groupby sum()过滤数据框。我正在寻找特殊日期的金额为零的行。

我已经通过创建一个for循环解决了这个问题。我怀疑如果数据帧很大,这会降低性能。

它似乎也很笨拙。

newdf = pd.DataFrame()
newdf['name'] = ('leon','eurika','monica','wian')
newdf['surname'] = ('swart','swart','swart','swart')
newdf['birthdate'] = ('14051981','198001','20081012','20100621')
newdf['tdate'] = ('13/05/2015','14/05/2015','15/05/2015', '13/05/2015')
newdf['tamount'] = (100.10, 111.11, 123.45, -100.10)

df = newdf.groupby(['tdate'])[['tamount']].sum().reset_index()
df2 = df.loc[df["tamount"] == 0, "tdate"]
df3 = pd.DataFrame()
for i in df2:
    df3 = df3.append(newdf.loc[newdf["tdate"] == i])

print (df3)

下面的代码创建的两行输出在tamount上组合时变为零

   name surname  birthdate       tdate  tamount
0  leon   swart 1981-05-14  13/05/2015    100.1
3  wian   swart 2010-06-21  13/05/2015   -100.1

1 个答案:

答案 0 :(得分:0)

只需使用基本的numpy :)

import numpy as np

df = newdf.groupby(['tdate'])[['tamount']].sum().reset_index()

dates = df['tdate'][np.where(df['tamount'] == 0)[0]]

newdf[np.isin(newdf['tdate'], dates) == True]

希望这会有所帮助;如果您有任何问题,请告诉我。