Pandas数据帧逻辑实现

时间:2018-04-14 14:23:09

标签: python-3.x pandas

我有一个包含列的数据集:

 `subscribe_date`   `package_id`    `subscription_name` `user_id`   `subscription_status`

subscription_status的值已取消有效已失效已过期已撤消重新启动

根据subscription_status值,我必须创建一个名为churn的列。如果用户的值为“已取消”或“过期“为他们的subscription_status

某些用户可能会多次出现状态值不同,如果用户有“已取消”“已过期”,则认为该用户已被淘汰“随时为他们的subscription_status。

这是我的代码:

# Set a default value of churn as no
 subscriber_data['churn'] = 'no'

# Set churn value for all row indexes as yes which Age are cancelled or expired
subscriber_data['churn'][(subscriber_data['subscription_status'] =="cancelled") | (subscriber_data['subscription_status'] =="expired")] = 'yes'

现在,每个用户都标记为“是”或“否”或两者都标记。如何进一步处理,如果用户有两个或多个值“是”和“否”,则在所有情况下都应将其屏蔽为“是”。

示例数据:

subscribe_date   package_id   subscription_name  user_id   subscription_status  churn
10/28/2015 23:29  0903a465-28f7-45b3-9860-12be9deed4ca   14 Day  0002b38f-ec0a-4ee5-8710-9cf54691bb55    cancelled   yes
6/21/2016 21:39  f3a5a639-f4df-4ebd-885d-abea26b37027    30-DayPass  00068201-1d40-4a84-b9bf-f4592aef9ba3    active  no
6/29/2016 19:30  f3a5a639-f4df-4ebd-885d-abea26b37027    30-DayPass  00068201-1d40-4a84-b9bf-f4592aef9ba3    cancelled   yes

1 个答案:

答案 0 :(得分:1)

您可以按user_id对行进行分组,检查churn的每一行是否等于"yes",相应地转换该组的所有行:

import numpy as np
df.churn = np.where(df.groupby('user_id')['churn'].transform( \
    lambda x: (x == 'yes').any()), 'yes', df.churn)