我想从数据框的不同列中创建一个反向虚拟变量。
数据框列如下所示:
client booking_by_phone booking_online booking_online ... no_call_ad no_sms_ad no_ad_other
2q332 1 0 0 1 1 0
as4e3 0 0 1 0 0 0
ad222 0 1 0 1 0 0
q2x31 1 0 0 1 1 1
我目前的方法运行成功,但是自从我使用iterrows()
for idx, _ in df.iterrows():
if df.loc[idx, 'booking_by_phone'] == 1:
df.loc[idx, 'bookingchannel'] = "phone"
elif df.loc[idx, 'booking_online'] == 1:
df.loc[idx, 'bookingchannel'] = "online"
else:
df.loc[idx, 'bookingchannel'] = "agency"
对于第二个变量,它花费的时间甚至更长,因为用户可能拒绝在多个渠道中投放广告,所以我不能使用elif:
for idx, _ in df.iterrows():
df.loc[idx, 'ad_ban'] = 0
if df.loc[idx, 'no_email_ad'] == 1:
df.loc[idx, 'ad_ban'] += 1
if df.loc[idx, 'no_mail_ad'] == 1:
df.loc[idx, 'ad_ban'] += 2
if df.loc[idx, 'no_call_ad'] == 1:
df.loc[idx, 'ad_ban'] += 4
if df.loc[idx, 'no_catalog_ad'] == 1:
df.loc[idx, 'ad_ban'] += 8
if df.loc[idx, 'no_sms_ad'] == 1:
df.loc[idx, 'ad_ban'] += 16
if df.loc[idx, 'no_ad_other'] == 1:
df.loc[idx, 'ad_ban'] += 32
有更快,更轻松的方法吗?
答案 0 :(得分:0)
让我们看看预订渠道。这是使用布尔型掩码的方法:
df['booking channel'] = 'agency' # default value
mask = df['booking_by_phone'] == 1
df.loc[mask, 'booking channel'] = 'phone'
mask = df['booking_online'] == 1
df.loc[mask, 'booking channel'] = 'online'
您可以创建一个元组列表-[('booking_by_phone','phone'),('booking_online','online'),...]并循环执行分配。