熊猫:如何在其他多个列上有条件地分配单个列?

时间:2019-12-24 12:25:59

标签: pandas

我对熊猫中的条件分配感到困惑。

我有这个数据框:

df = pd.DataFrame([
   { 'stripe_subscription_id': 1, 'status': 'past_due' },
   { 'stripe_subscription_id': 2, 'status': 'active' },
   { 'stripe_subscription_id': None, 'status': 'active' },
   { 'stripe_subscription_id': None, 'status': 'active' },
])

我试图根据其他条件有条件地添加一个新列:

def get_cancellation_type(row):
    if row.stripe_subscription_id:
        if row.status == 'past_due':
            return 'failed_to_pay'
        elif row.status == 'active':
            return 'cancelled_by_us'
    else:
        return 'cancelled_by_user'
df['cancellation_type'] = df.apply(get_cancellation_type, axis=1)

这是相当可读的,但这是做事的标准方法吗?

我一直在查看pd.assign,但不确定是否应该使用它。

2 个答案:

答案 0 :(得分:0)

这应该可行,您可以根据需要更改或添加条件。

df.loc[(df['stripe_subscription_id'] != np.nan) & (df['status'] == 'past_due'), 'cancellation_type'] = 'failed_to_pay'
df.loc[(df['stripe_subscription_id'] != np.nan) & (df['status'] == 'active'), 'cancellation_type'] = 'cancelled_by_us'
df.loc[(df['stripe_subscription_id'] == np.nan), 'cancellation_type'] = 'cancelled_by_user'

答案 1 :(得分:0)

您想使用np.select

import pandas as pd
import numpy as np
condList = [df["status"]=="past_due",
            df["status"]=="active",
            ~df["status"].isin(["past_due",
                                "active"])]


choiceList = ["failed_to_pay", "cancelled_by_us", "cancelled_by_user"]

df['cancellation_type'] = np.select(condList, choiceList)