假设我每个员工姓名有 10k 行客户并发送电子邮件,这是数据框的基本示例:
df = pd.DataFrame(np.array([['Sara', 'CustomerA', 4], ['John', 'CustomerA', 0], ['Silvia', 'CustomerA', 0],['Sara', 'CustomerB', 0],['John', 'CustomerB', 1],['Silvia', 'CustomerB', 1]]),
columns=['Employee', 'Customer', 'Opened Emails'])
df
Employee Customer Opened Emails
0 Sara CustomerA 4
1 John CustomerA 0
2 Silvia CustomerA 0
3 Sara CustomerB 0
4 John CustomerB 1
5 Silvia CustomerB 1
现在我想为每个客户做一个报告,看看他们是否已经打开了来自不同员工的电子邮件,并以 0 个打开的电子邮件通知员工。
到目前为止我所做的是:
1- 按客户和员工对数据框进行分组,并使用 describe()
和 top
:
total = df.groupby(['Customer','Employee'])['Sent Emails'].describe()[['top']]
top
Customer Employee
CustomerA John 0
Sara 4
Silvia 0
CustomerB John 1
Sara 0
Silvia 1
2-
total.astype({"top": int})
3- 我想迭代每个客户,看看顶部的 sum()
是否 > 0,然后用 0 通知其他员工该客户正在接收电子邮件。
我无法找到访问值和执行条件的最佳方法的问题,这是我尝试过的,但似乎不是一个有很多错误的好方法
for index, column in df.iterrows():
if column['top'].sum() != 0:
if column['top'] == 0:
print(column['Employee'])
谢谢
答案 0 :(得分:3)
我们可以计算客户cs
打开的电子邮件,以及每个员工ecs
发送的客户打开的电子邮件,然后找到cs
大于0的记录(客户打开至少一些电子邮件),但 ecs
等于 0(客户没有打开来自特定员工的电子邮件):
# convert opened emails to integer
df['Opened Emails'] = df['Opened Emails'].astype(int)
# opened emails by customer
cs = df.groupby('Customer')['Opened Emails'].transform('sum')
# opened emails by customer and employee
ecs = df.groupby(['Customer', 'Employee'])['Opened Emails'].transform('sum')
# employee-customer pairs with 0 opened emails
# while overall customer opened emails is greater than 0
df.loc[(cs.gt(0)) & (ecs.eq(0))]
输出:
Employee Customer Opened Emails
1 John CustomerA 0
2 Silvia CustomerA 0
3 Sara CustomerB 0
因此,可以通知 John 和 Silvia 客户 A 正在接收电子邮件,并且可以通知 Sara 客户 B 正在接收电子邮件。
答案 1 :(得分:0)
print(df[df["Customer"] == "CustomerA"]["Opened Emails"].astype(int).sum())
print(df[df["Customer"] == "CustomerB"]["Opened Emails"].astype(int).sum())
输出
4
2
不是一个非常理想的解决方案,但它有效;)