根据汇总列保留数据帧值

时间:2020-06-18 17:57:57

标签: python pandas dataframe filter sum

对于我以前帮助过的一个人,这是一个后续问题。

这是问题所在。假设有一个数据框-

dic = {'firstname':['John','John','John','John','John','Susan','Susan',
                    'Susan','Susan','Susan','Mike','Mike','Mike','Mike',
                    'Mike'],
       'lastname':['Smith','Smith','Smith','Smith','Smith','Wilson',
                   'Wilson','Wilson','Wilson','Wilson','Jones','Jones',
                   'Jones','Jones','Jones'],
       'company':['KFC','BK','KFC','KFC','KFC','BK','BK','WND','WND',
                  'WND','TB','CHP','TB','CHP','TB'],
       'paid':[200,300,250,100,900,650,430,218,946,789,305,750,140,860,310],
       'overtime':[205,554,840,100,203,640,978,451,356,779,650,950,230,250,980]}
df = pd.DataFrame(dic)
print(df)

有输出-

   firstname lastname company  paid  overtime
0       John    Smith     KFC   200       205
1       John    Smith      BK   300       554
2       John    Smith     KFC   250       840
3       John    Smith     KFC   100       100
4       John    Smith     KFC   900       203
5      Susan   Wilson      BK   650       640
6      Susan   Wilson      BK   430       978
7      Susan   Wilson     WND   218       451
8      Susan   Wilson     WND   946       356
9      Susan   Wilson     WND   789       779
10      Mike    Jones      TB   305       650
11      Mike    Jones     CHP   750       950
12      Mike    Jones      TB   140       230
13      Mike    Jones     CHP   860       250
14      Mike    Jones      TB   310       980

最初,我想对“付费”列求和,并仅显示1300以上的值。 这是通过这种方式解决的

df = df.groupby(['lastname', 'firstname','company'], as_index=False).agg({'paid':'sum'})
s = df['paid']>1300
df['limit']=s
df = df.loc[df['limit']==True]
del df['limit']
df = df.sort_values(by=['paid'],ascending=False).reset_index()
del df['index']
print(df)

有输出-

  lastname firstname company  paid
0   Wilson     Susan     WND  1953
1    Jones      Mike     CHP  1610
2    Smith      John     KFC  1450

我现在想要做的事情相对相似,但是我不再想要对这些值求和,我只想保留基于“已付费”列总计超过1300行的原始信息。

所需的输出-

   firstname lastname company  paid  overtime
0       John    Smith     KFC   200       205
1       John    Smith     KFC   250       840
2       John    Smith     KFC   100       100
3       John    Smith     KFC   900       203
4      Susan   Wilson     WND   218       451
5      Susan   Wilson     WND   946       356
6      Susan   Wilson     WND   789       779
7       Mike    Jones     CHP   750       950
8       Mike    Jones     CHP   860       250

1 个答案:

答案 0 :(得分:0)

这是一个非常简单的单行更改。代替agg,进行变换:

df = df.groupby(['lastname', 'firstname','company'], as_index=False).transform(sum)
And then,
df[df.groupby(['lastname', 'firstname','company'])['paid'].transform('sum') > 1350]

编辑:感谢Datanovice指出我应该做一个完整的答案并写下最后一行。