让我们采用这个数据框:
import pandas as pd
L0 = ['d','a','b','c','d','a','b','c','d','a','b','c']
L1 = ['z','z','z','z','x','x','x','x','y','y','y','y']
L2 = [1,6,3,8,7,6,7,6,3,5,6,5]
df = pd.DataFrame({"A":L0,"B":L1,"C":L2})
df = df.pivot(columns="A",index="B",values="C")
旋转后,列和行按字母顺序排列。
重新排序列非常简单,可以使用自定义列标签列表来完成:
df = df[['d','a','b','c']]
但重新排序行没有这样的直接功能,我能想到的最优雅的方法是使用列标签功能并前后移调:
df = df.T[['z','x','y']].T
这样做,例如根本没有效果:
df.loc[['x','y','z'],:] = df.loc[['z','x','y'],:]
通过提供索引标签的自定义列表,没有直接的方法来对数据行的行进行排序吗?
答案 0 :(得分:3)
您可以使用reindex
或reindex_axis
,速度更快loc
:
index
:
idx = ['z','x','y']
df = df.reindex(idx)
print (df)
A a b c d
B
z 6 3 8 1
x 6 7 6 7
y 5 6 5 3
或者:
idx = ['z','x','y']
df = df.reindex_axis(idx)
print (df)
A a b c d
B
z 6 3 8 1
x 6 7 6 7
y 5 6 5 3
正如ssm所指出的那样:
df = df.loc[['z', 'x', 'y'], :]
print (df)
A a b c d
B
z 6 3 8 1
x 6 7 6 7
y 5 6 5 3
对于列:
cols = ['d','a','b','c']
df = df.reindex(columns=cols)
print (df)
A d a b c
B
x 7 6 7 6
y 3 5 6 5
z 1 6 3 8
cols = ['d','a','b','c']
df = df.reindex_axis(cols, axis=1)
print (df)
A d a b c
B
x 7 6 7 6
y 3 5 6 5
z 1 6 3 8
这两种:
idx = ['z','x','y']
cols = ['d','a','b','c']
df = df.reindex(columns=cols, index=idx)
print (df)
A d a b c
B
z 1 6 3 8
x 7 6 7 6
y 3 5 6 5
<强>计时强>:
In [43]: %timeit (df.loc[['z', 'x', 'y'], ['d', 'a', 'b', 'c']])
1000 loops, best of 3: 653 µs per loop
In [44]: %timeit (df.reindex(columns=cols, index=idx))
1000 loops, best of 3: 402 µs per loop
仅索引:
In [49]: %timeit (df.reindex(idx))
The slowest run took 5.16 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 271 µs per loop
In [50]: %timeit (df.reindex_axis(idx))
The slowest run took 6.50 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 252 µs per loop
In [51]: %timeit (df.loc[['z', 'x', 'y']])
The slowest run took 5.51 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 418 µs per loop
In [52]: %timeit (df.loc[['z', 'x', 'y'], :])
The slowest run took 4.87 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 542 µs per loop
def pir(df):
idx = ['z','x','y']
a = df.index.values.searchsorted(idx)
df = pd.DataFrame(
df.values[a],
df.index[a], df.columns
)
return df
In [63]: %timeit (pir(df))
The slowest run took 7.75 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 91.8 µs per loop
答案 1 :(得分:1)
使用 (float)value
是一种非常自然的方法
loc
您可以使用
将其分配回数据框df.loc[['z', 'x', 'y']]
A d a b c
B
z 1 6 3 8
x 7 6 7 6
y 3 5 6 5
两个轴合在一起df = df.loc[['z', 'x', 'y']]
loc
使用 df.loc[['z', 'x', 'y'], ['d', 'a', 'b', 'c']]
A d a b c
B
z 1 6 3 8
x 7 6 7 6
y 3 5 6 5
numpy.searchsorted