我有一个熊猫数据框,其中包含有关拒绝的信息。关于此问题的一些背景知识,电子邮件发件人可以多次发送同一封电子邮件,但只能解决一次。我仍要在新列中说明与“已解决”具有相同发件人和消息的电子邮件。
起始数据帧如下:
data = [['Sent from automated email', 'jim@yahoo.com', 'Resolved','2020-01-13 07:06:34'],
['Sent from automated email', 'jim@yahoo.com', 'Rejected','2020-01-13 07:06:39'],
['Hello I would like for you to make an update please','new101@cnn.com', 'Resolved', '2020-02-14 09:06:39'],
['Hello I would like for you to make an update please','new101@cnn.com', 'Rejected', '2020-02-14 09:06:41'],
['Hello I would like for you to make an update please','new101@cnn.com', 'Resolved', '2020-02-14 09:06:59'],
['Take one newspaper','notneeded@gmail.com', 'Resolved', '2020-02-17 09:05:39'],
['Hey hows it going','jamie@gmail.com', 'Rejected', '2020-03-12 09:03:42'],
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Message', 'Email','Resolution','Time Sent'])
我想接收所有具有相同发件人和相同消息,但解决方案不同的电子邮件,如果以前的任何电子邮件已解决,则将它们标记为“已解决”。我想要的输出是:
data = [['Sent from automated email', 'jim@yahoo.com', 'Resolved','2020-01-13 07:06:34','Resolved' ],
['Sent from automated email', 'jim@yahoo.com', 'Rejected','2020-01-13 07:06:39','Resolved'],
['Hello I would like for you to make an update please','new101@cnn.com', 'Resolved', '2020-02-14 09:06:39','Resolved'],
['Hello I would like for you to make an update please','new101@cnn.com', 'Rejected', '2020-02-14 09:06:41','Resolved'],
['Hello I would like for you to make an update please','new101@cnn.com', 'Resolved', '2020-02-14 09:06:59','Resolved'],
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Message', 'Email','Resolution','Time Sent','Real Resolution'])
我尝试编写如下函数:
def a(df):
if df[df['message'].duplicated()] & df[(df['resolution'] == 'Rejected') | (df['resolution'] == 'Resolved') ] & df[df['Email].duplicated()]:
df['Real Resolution'] = 'Resolved'
df['Real Resolution'] = df.apply(a)
我认为这是不正确的,因为我不仅仅考虑已解决然后被拒绝的重复邮件。有小费吗?谢谢!
答案 0 :(得分:1)
IIUC,您可以尝试以下操作:
c = df[['Message','Email']].duplicated(keep=False) #check duplicate in Message+Email
c1 = df[['Message','Email','Resolution']].duplicated(keep=False) #check resolution too
#condition is if c is True and c1 is False then check if email group has any True
df.loc[(c & ~c1).groupby(df['Email']).transform('any'),'Real Resolution'] = 'Resolved'
out = df.dropna(subset=['Real Resolution']).copy()
print(out)
Message Email \
0 Sent from automated email jim@yahoo.com
1 Sent from automated email jim@yahoo.com
2 Hello I would like for you to make an update p... new101@cnn.com
3 Hello I would like for you to make an update p... new101@cnn.com
4 Hello I would like for you to make an update p... new101@cnn.com
Resolution Time Sent Real Resolution
0 Resolved 2020-01-13 07:06:34 Resolved
1 Rejected 2020-01-13 07:06:39 Resolved
2 Resolved 2020-02-14 09:06:39 Resolved
3 Rejected 2020-02-14 09:06:41 Resolved
4 Resolved 2020-02-14 09:06:59 Resolved