对于我以前帮助过的一个人,这是一个后续问题。
这是问题所在。假设有一个数据框-
dic = {'firstname':['John','John','John','John','John','Susan','Susan',
'Susan','Susan','Susan','Mike','Mike','Mike','Mike',
'Mike'],
'lastname':['Smith','Smith','Smith','Smith','Smith','Wilson',
'Wilson','Wilson','Wilson','Wilson','Jones','Jones',
'Jones','Jones','Jones'],
'company':['KFC','BK','KFC','KFC','KFC','BK','BK','WND','WND',
'WND','TB','CHP','TB','CHP','TB'],
'paid':[200,300,250,100,900,650,430,218,946,789,305,750,140,860,310],
'overtime':[205,554,840,100,203,640,978,451,356,779,650,950,230,250,980]}
df = pd.DataFrame(dic)
print(df)
有输出-
firstname lastname company paid overtime
0 John Smith KFC 200 205
1 John Smith BK 300 554
2 John Smith KFC 250 840
3 John Smith KFC 100 100
4 John Smith KFC 900 203
5 Susan Wilson BK 650 640
6 Susan Wilson BK 430 978
7 Susan Wilson WND 218 451
8 Susan Wilson WND 946 356
9 Susan Wilson WND 789 779
10 Mike Jones TB 305 650
11 Mike Jones CHP 750 950
12 Mike Jones TB 140 230
13 Mike Jones CHP 860 250
14 Mike Jones TB 310 980
最初,我想对“付费”列求和,并仅显示1300以上的值。 这是通过这种方式解决的
df = df.groupby(['lastname', 'firstname','company'], as_index=False).agg({'paid':'sum'})
s = df['paid']>1300
df['limit']=s
df = df.loc[df['limit']==True]
del df['limit']
df = df.sort_values(by=['paid'],ascending=False).reset_index()
del df['index']
print(df)
有输出-
lastname firstname company paid
0 Wilson Susan WND 1953
1 Jones Mike CHP 1610
2 Smith John KFC 1450
我现在想要做的事情相对相似,但是我不再想要对这些值求和,我只想保留基于“已付费”列总计超过1300行的原始信息。
所需的输出-
firstname lastname company paid overtime
0 John Smith KFC 200 205
1 John Smith KFC 250 840
2 John Smith KFC 100 100
3 John Smith KFC 900 203
4 Susan Wilson WND 218 451
5 Susan Wilson WND 946 356
6 Susan Wilson WND 789 779
7 Mike Jones CHP 750 950
8 Mike Jones CHP 860 250
答案 0 :(得分:0)
这是一个非常简单的单行更改。代替agg,进行变换:
df = df.groupby(['lastname', 'firstname','company'], as_index=False).transform(sum)
And then,
df[df.groupby(['lastname', 'firstname','company'])['paid'].transform('sum') > 1350]
编辑:感谢Datanovice指出我应该做一个完整的答案并写下最后一行。