我的数据具有某些顺序功能,这些功能必须先于其他功能才能提供。 我想让用户访问最终页面。
import numpy as np
import pandas as pd
df = pd.DataFrame({'user': [10,15,17],
'sex': ['M','M','F'],
'home_page': [1,1,1],
'search_page': [1,0,1],
'confirmation_page': [1,0,0],
'payment_page':[1,0,0]})
print(df)
user sex home_page search_page confirmation_page payment_page
0 10 M 1 1 1 1
1 15 M 1 0 0 0
2 17 F 1 1 0 0
如何获取名称为“ final_page”的新列,该列具有访问的最终页面的名称。
必填项
df['final_page'] = ['payment_page','home_page','search_page'] # this is not answer,
# The new column should have these values.
a = df.iloc[:,2:].to_numpy()
np.trim_zeros(a)
答案 0 :(得分:2)
您可以在条件为df!=0
的情况下使用列的dot
乘积,然后拆分并获得最后一列:
m=df.set_index(['user','sex'],append=True)
df['final_page']=(m.ne(0).dot(m.columns+ ',').str.rstrip(',').str.split(',')
.str[-1].droplevel(['user','sex']))
print(df)
或者:
df['final_page']=m.apply(pd.Series.last_valid_index,axis=1).reset_index(drop=True)
user sex home_page search_page confirmation_page payment_page \
0 10 M 1 1 1 1
1 15 M 1 0 0 0
2 17 F 1 1 0 0
final_page
0 payment_page
1 home_page
2 search_page
答案 1 :(得分:2)
使用numpy:
import numpy as np
import pandas as pd
df = pd.DataFrame({'user': [10,15,17],
'sex': ['M','M','F'],
'home_page': [1,1,1],
'search_page': [1,0,1],
'confirmation_page': [1,0,0],
'payment_page':[1,0,0]})
pages = df.columns[2:]
df['final_page'] = df.iloc[:,2:].apply(lambda x: pages[np.max(np.nonzero(x))],axis=1)
print(df)
结果:
user sex home_page search_page confirmation_page payment_page \
0 10 M 1 1 1 1
1 15 M 1 0 0 0
2 17 F 1 1 0 0
final_page
0 payment_page
1 home_page
2 search_page