假设我们有一个数据框:
df = pd.DataFrame(pd.np.zeros((15,10,)), dtype=int, \
index=[['a']*5+['b']*5+['c']*5, list(range(15))])
df.index.names=['index0', 'index1']
pd.np.random.seed(0)
for i, v in df.iterrows():
v.loc[pd.np.random.randint(10)] = 1
df
0 1 2 3 4 5 6 7 8 9
index0 index1
a 0 0 0 0 0 0 1 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0
2 0 0 0 1 0 0 0 0 0 0
3 0 0 0 1 0 0 0 0 0 0
4 0 0 0 0 0 0 0 1 0 0
b 5 0 0 0 0 0 0 0 0 0 1
6 0 0 0 1 0 0 0 0 0 0
7 0 0 0 0 0 1 0 0 0 0
8 0 0 1 0 0 0 0 0 0 0
9 0 0 0 0 1 0 0 0 0 0
c 10 0 0 0 0 0 0 0 1 0 0
11 0 0 0 0 0 0 1 0 0 0
12 0 0 0 0 0 0 0 0 1 0
13 0 0 0 0 0 0 0 0 1 0
14 0 1 0 0 0 0 0 0 0 0
如何首先按照出现的顺序“ 1”对块a,b和c中的行进行排序,然后再对a,b和c进行排序?
预期输出:
0 1 2 3 4 5 6 7 8 9
index0 index1
a 1 1 0 0 0 0 0 0 0 0 0
2 0 0 0 1 0 0 0 0 0 0
3 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0
4 0 0 0 0 0 0 0 1 0 0
c 14 0 1 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 1 0 0 0
10 0 0 0 0 0 0 0 1 0 0
12 0 0 0 0 0 0 0 0 1 0
13 0 0 0 0 0 0 0 0 1 0
b 8 0 0 1 0 0 0 0 0 0 0
6 0 0 0 1 0 0 0 0 0 0
9 0 0 0 0 1 0 0 0 0 0
7 0 0 0 0 0 1 0 0 0 0
5 0 0 0 0 0 0 0 0 0 1
编辑:值可以不是“ 1”,实际上是不同的文本值。
答案 0 :(得分:1)
一种方法是将pandas.DataFrame.groupby
与idxmax
和sort_values
结合使用:
import pandas as pd
l = (d.loc[d.idxmax(1).sort_values().index] for _, d in df.groupby('index0'))
new_df = pd.concat(sorted(l, key= lambda x:list(x.sum()), reverse=True))
print(new_df)
输出:
0 1 2 3 4 5 6 7 8 9
index0 index1
a 1 1 0 0 0 0 0 0 0 0 0
2 0 0 0 1 0 0 0 0 0 0
3 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0
4 0 0 0 0 0 0 0 1 0 0
c 14 0 1 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 1 0 0 0
10 0 0 0 0 0 0 0 1 0 0
12 0 0 0 0 0 0 0 0 1 0
13 0 0 0 0 0 0 0 0 1 0
b 8 0 0 1 0 0 0 0 0 0 0
6 0 0 0 1 0 0 0 0 0 0
9 0 0 0 0 1 0 0 0 0 0
7 0 0 0 0 0 1 0 0 0 0
5 0 0 0 0 0 0 0 0 0 1
如果1
是文本,其余部分相同,请尝试使用pandas.Dataframe.ne
tmp = df.ne(0)
# same operation
new_df = df.loc[new_tmp.index]