我的数据框:
display_name security_type1 currency_str state
A GOVT USD Done
B CORP NZD Passed
B CORP USD Done
C CORP EUR Done
C CORP EUR Traded Away
C CORP GBP Done
C CORP GBP Done
C CORP USD Done
我受辱的结果是:
a。分组依据display_name
,security_type1
和currency_str
b。然后计算column state
包含Done
的行数并更新列Done_RFQ
c。显示每种display_name
,security_type1
和currency_str
组合的总行数,并更新列Total_RFQ
Done_Pct = Done_RFQ / Total_RFQ
display_name security_type1 currency_str Done_RFQ Total_RFQ Done_Pct
A GOVT USD 1 1 100%
B CORP USD 1 2 50%
C CORP EUR 1 5 20%
C CORP GBP 2 5 40%
C CORP USD 1 5 20%
除了Total_RFQ
和Done_Pct
之外,我的代码都可以使用
d = [('Done_RFQ', 'size')]
df_Done_Client = df[
df['state'].str.contains('Done')
][['display_name','security_type1','currency_str','state']].copy()
df_Done_Client =
df_Done_Client.groupby(['display_name','security_type1','currency_str'])['state'].agg(d).reset_index()
# Sum of all Done RFQ's per display_name
Sum_of_Done_For_Month = df_Done_Client.groupby('display_name')['Done_RFQ'].transform('sum')
df_Done_Client['Total_Done_RFQ'] = Sum_of_Done_For_Month
df_Done_Client['Done_Pct'] = df_Done_Client['Done_RFQ_For_Month'].div(Sum_of_Done_For_Month).round(5)
display(df_Done_Client)
我不清楚如何计算该总数,因为它需要来自另一个数据框,即相同的字段,但没有“完成”条件。
df_All_Client = df[['display_name','security_type1','currency_str','state']].copy()
答案 0 :(得分:1)
我认为需要Total_RFQ
列,其中size
-总计数和Done_RFQ
的布尔掩码计数-与{{1的Done
和sum
比较}}:
True
如果需要检查子字符串:
d = [('Total_RFQ', 'size'), ('Done_RFQ', lambda x: x.eq('Done').sum())]
df=df.groupby(['display_name','security_type1','currency_str'])['state'].agg(d).reset_index()
df['Done_Pct'] = df['Done_RFQ'] / df['Total_RFQ'] * 100
print (df)
display_name security_type1 currency_str Total_RFQ Done_RFQ Done_Pct
0 A GOVT USD 1 1 100.0
1 B CORP NZD 1 0 0.0
2 B CORP USD 1 1 100.0
3 C CORP EUR 2 1 50.0
4 C CORP GBP 2 2 100.0
5 C CORP USD 1 1 100.0
答案 1 :(得分:1)
这是一种方式。与@jezrael的解决方案类似,但保持逻辑检查子字符串Done
和过滤器Done_RFQ > 0
。
此外,我认为您需要进行2次groupby
计算才能获得所需的结果,即Total_RFQ
由display_name
计算得出。
# function to calcuate Done_RFQ
d = {'Done_RFQ': lambda x: x.str.contains('Done', na=False, regex=False).sum()}
# apply 2 groupby calculations
df['Total_RFQ'] = df.groupby('display_name')['display_name'].transform('size')
group_cols = ['display_name', 'security_type1', 'currency_str', 'Total_RFQ']
res = df.groupby(group_cols)['state'].agg(d).reset_index()
# calculate Done_Pct
res['Done_Pct'] = res['Done_RFQ'] / res['Total_RFQ']
# filter for Done_RFQ > 0
res = res[res['Done_RFQ'] > 0]
print(res)
display_name security_type1 currency_str Total_RFQ Done_RFQ Done_Pct
0 A GOVT USD 1 1 1.0
2 B CORP USD 2 1 0.5
3 C CORP EUR 5 1 0.2
4 C CORP GBP 5 2 0.4
5 C CORP USD 5 1 0.2