我对熊猫中的条件分配感到困惑。
我有这个数据框:
df = pd.DataFrame([
{ 'stripe_subscription_id': 1, 'status': 'past_due' },
{ 'stripe_subscription_id': 2, 'status': 'active' },
{ 'stripe_subscription_id': None, 'status': 'active' },
{ 'stripe_subscription_id': None, 'status': 'active' },
])
我试图根据其他条件有条件地添加一个新列:
def get_cancellation_type(row):
if row.stripe_subscription_id:
if row.status == 'past_due':
return 'failed_to_pay'
elif row.status == 'active':
return 'cancelled_by_us'
else:
return 'cancelled_by_user'
df['cancellation_type'] = df.apply(get_cancellation_type, axis=1)
这是相当可读的,但这是做事的标准方法吗?
我一直在查看pd.assign
,但不确定是否应该使用它。
答案 0 :(得分:0)
这应该可行,您可以根据需要更改或添加条件。
df.loc[(df['stripe_subscription_id'] != np.nan) & (df['status'] == 'past_due'), 'cancellation_type'] = 'failed_to_pay'
df.loc[(df['stripe_subscription_id'] != np.nan) & (df['status'] == 'active'), 'cancellation_type'] = 'cancelled_by_us'
df.loc[(df['stripe_subscription_id'] == np.nan), 'cancellation_type'] = 'cancelled_by_user'
答案 1 :(得分:0)
您想使用np.select
import pandas as pd
import numpy as np
condList = [df["status"]=="past_due",
df["status"]=="active",
~df["status"].isin(["past_due",
"active"])]
choiceList = ["failed_to_pay", "cancelled_by_us", "cancelled_by_user"]
df['cancellation_type'] = np.select(condList, choiceList)