我有一个如下数据框:
df = pd.DataFrame({'Date': ['02/01/2019', '03/01/2019', '04/01/2019', '07/01/2019', '08/01/2019', '09/01/2019', '10/01/2019', '11/01/2019', '14/01/2019', '15/01/2019'],
'VOD': [3, 2.3, 2, 1.8, 2, 4, 5, 4, 3, 1],
'BBY': [0.9, 1, 1.2, 1, 1, 2.3, 2.4, 2.5, 3, 2.9],
'STJ': [4, 4.2, 4.3, 4.4, 3.5, 3, 2, 1, 1.2, 2],
'RBS': [0.5, 0.6, 0.7, 0.6, 1, 1.2, 1.3, 1.4, 1.5, 2]})
从此数据框中,我可以按列对每一行进行排序,如下所示:
df1 = df.rank(1, ascending=False, method='first')
我正在尝试将1分配给排名最高的两个(在第一行中是VOD和STJ),将0分配给其他。
我的目标是建立一个如下表:
result = pd.DataFrame({'Date': ['02/01/2019', '03/01/2019', '04/01/2019', '07/01/2019', '08/01/2019', '09/01/2019', '10/01/2019', '11/01/2019', '14/01/2019', '15/01/2019'],
'VOD': [1, 1, 1, 1, 1, 1, 1, 1, 1, 0],
'BBY': [0,0,0,0,0,0,1,1,1,1],
'STJ': [1,1,1,1,1,1,0,0,0,1],
'RBS': [0,0,0,0,0,0,0,0,0,0]})
我认为if语句会起作用,但不能使rank()起作用。想法大受欢迎。
答案 0 :(得分:3)
使用DataFrame.isin
进行True/False
到1/0
映射的强制转换为整数:
cols = ['VOD','BBY','STJ','RBS']
df[cols] = df[cols].rank(axis=1, ascending=False, method='first').isin([1,2]).astype(int)
或使用numpy.where
:
df[cols] = np.where(df[cols].rank(axis=1, ascending=False, method='first').isin([1,2]), 1, 0)
print (df)
Date VOD BBY STJ RBS
0 02/01/2019 1 0 1 0
1 03/01/2019 1 0 1 0
2 04/01/2019 1 0 1 0
3 07/01/2019 1 0 1 0
4 08/01/2019 1 0 1 0
5 09/01/2019 1 0 1 0
6 10/01/2019 1 1 0 0
7 11/01/2019 1 1 0 0
8 14/01/2019 1 1 0 0
9 15/01/2019 0 1 1 0
答案 1 :(得分:1)
import pandas as pd
df = pd.DataFrame({'Date': ['02/01/2019', '03/01/2019', '04/01/2019', '07/01/2019', '08/01/2019', '09/01/2019', '10/01/2019', '11/01/2019', '14/01/2019', '15/01/2019'],
'VOD': [3, 2.3, 2, 1.8, 2, 4, 5, 4, 3, 1],
'BBY': [0.9, 1, 1.2, 1, 1, 2.3, 2.4, 2.5, 3, 2.9],
'STJ': [4, 4.2, 4.3, 4.4, 3.5, 3, 2, 1, 1.2, 2],
'RBS': [0.5, 0.6, 0.7, 0.6, 1, 1.2, 1.3, 1.4, 1.5, 2]})
ranked_cols = ['VOD','BBY','STJ','RBS']
ranked = df[ranked_cols].rank(axis=1, ascending=False, method='first')
def allocate_ones(x):
if x in (1, 2): # top 2 ranked
return 1
else:
return 0
allocated = ranked.applymap(allocate_ones)
现在重新附加日期列:
allocated['Date'] = df['Date']
输出:
VOD BBY STJ RBS Date
0 1 0 1 0 02/01/2019
1 1 0 1 0 03/01/2019
2 1 0 1 0 04/01/2019
3 1 0 1 0 07/01/2019
4 1 0 1 0 08/01/2019
5 1 0 1 0 09/01/2019
6 1 1 0 0 10/01/2019
7 1 1 0 0 11/01/2019
8 1 1 0 0 14/01/2019
9 0 1 1 0 15/01/2019