我可以看到用于垂直排序记录的解决方案,但是我想水平排列数据框中的数据子集。
这是我的数据框,其中包含我要排序的数据:
account_num Word_0 Word_1 Word_2 Word_3 Word_4
123 Silver Platinum Osmium
456 Platinum
789 Silver Rhodium Platinum Osmium
这是我想要的输出:
account_num Word_0 Word_1 Word_2 Word_3 Word_4
123 Platinum Osmium Silver
456 Platinum
789 Rhodium Platinum Osmium Silver
根据此数据框内的顺序:
Priority Metal
1 Rhodium
2 Platinum
3 Gold
4 Ruthenium
5 Iridium
6 Osmium
7 Palladium
8 Rhenium
9 Silver
10 Indium
我已经使用这段代码整理了数据:
newdf.apply(lambda r: sorted(r,reverse = True), axis = 1)
其中将Word_0至4列放置在另一个数据帧(newdf)中,然后以相反的顺序排序,因此最后出现空白值,然后将它们重新连接到包含account_num列的原始数据帧中,但是我不知道如何合并订购顺序中的自定义列表。
任何帮助将不胜感激
谢谢
答案 0 :(得分:3)
我觉得我们可以melt
,merge
顺序df,然后sort_values
基于Priority
,然后pivot
返回>
s=df.melt('account_num').\
merge(orderdf,left_on='value',right_on='Metal',how='left').\
sort_values('Priority')
yourdf=s.assign(newkey=s.groupby('account_num').cumcount()).\
pivot('account_num','newkey','value').add_prefix('Word_')
yourdf
Out[1100]:
newkey Word_0 Word_1 Word_2 Word_3 Word_4
account_num
123 Platinum Osmium Silver None NaN
456 Platinum None None None NaN
789 Rhodium Platinum Osmium Silver NaN
或者我们对argsort
使用更清晰的逻辑
d = dict(zip(df2['Metal'], df2['Priority']))
for x in range(len(df)):
df.iloc[x,:]=df.values[x,np.argsort([d.get(x) if x ==x else 1000 for x in df.values[x,:]] )]
df
Out[38]:
Word_0 Word_1 Word_2 Word_3 Word_4
account_num
0 123 Platinum Osmium Silver NaN NaN
1 456 Platinum NaN NaN NaN NaN
2 789 Rhodium Platinum Osmium Silver NaN
答案 1 :(得分:3)
c = pd.Categorical(df2.Metal, df2.Metal, ordered=True)
df.set_index('account_num').transform(lambda k: pd.Categorical(k,
categories=c.categories)\
.sort_values(), axis=1)
输出
Word_0 Word_1 Word_2 Word_3 Word_4
account_num
123 Platinum Osmium Silver NaN NaN
456 Platinum NaN NaN NaN NaN
789 Rhodium Platinum Osmium Silver NaN
当然,总是可以.fillna('')
结尾。
答案 2 :(得分:3)
您也可以尝试:
df=df.fillna(value=pd.np.nan)
d=dict(zip(ref.Metal,ref.Priority))
df[['account_num']].join(pd.DataFrame(np.sort(df.iloc[:,1:].replace(d).values,axis=1),
columns=df.iloc[:,1:].columns).replace({v:k for k,v in d.items()}))
account_num Word_0 Word_1 Word_2 Word_3 Word_4
0 123 Platinum Osmium Silver NaN NaN
1 456 Platinum NaN NaN NaN NaN
2 789 Rhodium Platinum Osmium Silver NaN
答案 3 :(得分:2)
使用:
#create helper dictionary
d = dict(zip(df2['Metal'], df2['Priority']))
#add empty string for maximum priority
d[''] = df2['Priority'].max() + 1
#use sorted by key and dictioanry
L = [sorted(x, key=d.get) for x in df.fillna('').values]
#create new DataFrame by constructor
df1 = pd.DataFrame(L, index=df.index).add_prefix('Word_')
print (df1)
Word_0 Word_1 Word_2 Word_3 Word_4
account_num
123 Platinum Osmium Silver
456 Platinum
789 Rhodium Platinum Osmium Silver
如果需要缺少值:
df1 = pd.DataFrame(L, index=df.index).add_prefix('Word_').replace('', np.nan)
print (df1)
Word_0 Word_1 Word_2 Word_3 Word_4
account_num
123 Platinum Osmium Silver NaN NaN
456 Platinum NaN NaN NaN NaN
789 Rhodium Platinum Osmium Silver NaN