根据熊猫中的条件迭代和求和值

时间:2021-05-29 18:48:51

标签: python python-3.x pandas dataframe

假设我每个员工姓名有 10k 行客户并发送电子邮件,这是数据框的基本示例:

df = pd.DataFrame(np.array([['Sara', 'CustomerA', 4], ['John', 'CustomerA', 0], ['Silvia', 'CustomerA', 0],['Sara', 'CustomerB', 0],['John', 'CustomerB', 1],['Silvia', 'CustomerB', 1]]),

                   columns=['Employee', 'Customer', 'Opened Emails'])

df

    Employee    Customer    Opened Emails
0   Sara          CustomerA     4
1   John          CustomerA     0
2   Silvia        CustomerA     0
3   Sara          CustomerB     0
4   John          CustomerB     1
5   Silvia        CustomerB     1

现在我想为每个客户做一个报告,看看他们是否已经打开了来自不同员工的电子邮件,并以 0 个打开的电子邮件通知员工。

到目前为止我所做的是:

1- 按客户和员工对数据框进行分组,并使用 describe()top

total = df.groupby(['Customer','Employee'])['Sent Emails'].describe()[['top']]

                       top
Customer    Employee    
CustomerA   John        0
            Sara        4
            Silvia      0
CustomerB   John        1
            Sara        0
            Silvia      1

2-

total.astype({"top": int})

3- 我想迭代每个客户,看看顶部的 sum() 是否 > 0,然后用 0 通知其他员工该客户正在接收电子邮件。

我无法找到访问值和执行条件的最佳方法的问题,这是我尝试过的,但似乎不是一个有很多错误的好方法

for index, column in df.iterrows():
    if column['top'].sum() != 0:
        if column['top'] == 0:
        print(column['Employee'])

谢谢

2 个答案:

答案 0 :(得分:3)

我们可以计算客户cs打开的电子邮件,以及每个员工ecs发送的客户打开的电子邮件,然后找到cs大于0的记录(客户打开至少一些电子邮件),但 ecs 等于 0(客户没有打开来自特定员工的电子邮件):

# convert opened emails to integer
df['Opened Emails'] = df['Opened Emails'].astype(int)

# opened emails by customer
cs = df.groupby('Customer')['Opened Emails'].transform('sum')

# opened emails by customer and employee
ecs = df.groupby(['Customer', 'Employee'])['Opened Emails'].transform('sum')

# employee-customer pairs with 0 opened emails
# while overall customer opened emails is greater than 0
df.loc[(cs.gt(0)) & (ecs.eq(0))]

输出:

  Employee   Customer  Opened Emails
1     John  CustomerA              0
2   Silvia  CustomerA              0
3     Sara  CustomerB              0

因此,可以通知 John 和 Silvia 客户 A 正在接收电子邮件,并且可以通知 Sara 客户 B 正在接收电子邮件。

答案 1 :(得分:0)

print(df[df["Customer"] == "CustomerA"]["Opened Emails"].astype(int).sum())
print(df[df["Customer"] == "CustomerB"]["Opened Emails"].astype(int).sum())

输出

4
2

不是一个非常理想的解决方案,但它有效;)