给定索引的现有Dataframe。
>>> df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
>>> df
a b c d e
0 -0.131666 -0.315019 0.306728 -0.642224 -0.294562
1 0.769310 -1.277065 0.735549 -0.900214 -1.826320
2 -1.561325 -0.155571 0.544697 0.275880 -0.451564
3 0.612561 -0.540457 2.390871 -2.699741 0.534807
4 -1.504476 -2.113726 0.785208 -1.037256 -0.292959
5 0.467429 1.327839 -1.666649 1.144189 0.322896
6 -0.306556 1.668364 0.036508 0.596452 0.066755
7 -1.689779 1.469891 -0.068087 -1.113231 0.382235
8 0.028250 -2.145618 0.555973 -0.473131 -0.638056
9 0.633408 -0.791857 0.933033 1.485575 -0.021429
>>> df.set_index("a")
b c d e
a
-0.131666 -0.315019 0.306728 -0.642224 -0.294562
0.769310 -1.277065 0.735549 -0.900214 -1.826320
-1.561325 -0.155571 0.544697 0.275880 -0.451564
0.612561 -0.540457 2.390871 -2.699741 0.534807
-1.504476 -2.113726 0.785208 -1.037256 -0.292959
0.467429 1.327839 -1.666649 1.144189 0.322896
-0.306556 1.668364 0.036508 0.596452 0.066755
-1.689779 1.469891 -0.068087 -1.113231 0.382235
0.028250 -2.145618 0.555973 -0.473131 -0.638056
0.633408 -0.791857 0.933033 1.485575 -0.021429
如何将第3行移动到第一行?
那就是预期的结果:
b c d e
a
-1.561325 -0.155571 0.544697 0.275880 -0.451564
-0.131666 -0.315019 0.306728 -0.642224 -0.294562
0.769310 -1.277065 0.735549 -0.900214 -1.826320
0.612561 -0.540457 2.390871 -2.699741 0.534807
-1.504476 -2.113726 0.785208 -1.037256 -0.292959
0.467429 1.327839 -1.666649 1.144189 0.322896
-0.306556 1.668364 0.036508 0.596452 0.066755
-1.689779 1.469891 -0.068087 -1.113231 0.382235
0.028250 -2.145618 0.555973 -0.473131 -0.638056
0.633408 -0.791857 0.933033 1.485575 -0.021429
现在原来的第一行应该成为第二行。
答案 0 :(得分:5)
重新索引可能是以1个明显步骤将行放入任何新顺序的最佳解决方案,除非它可能需要生成一个可能非常大的新DataFrame。
例如
import pandas as pd
t = pd.read_csv('table.txt',sep='\s+')
t
Out[81]:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
1 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
2 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
t.index
Out[82]: Int64Index([0, 1, 2, 3], dtype='int64')
t2 = t.reindex([2,0,1,3]) # cannot do this in place
t2
Out[93]:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
2 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
0 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
1 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
现在可以将索引设置回范围(4)而无需重新索引:
t2.index=range(4)
Out[102]:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
1 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
2 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
也可以使用' tuple切换'和行选择作为基本机制,而不创建新的DataFrame。例如:
import pandas as pd
t = pd.read_csv('table.txt',sep='\s+')
t.ix[1], t.ix[2] = t.ix[2], t.ix[1]
t.ix[0], t.ix[1] = t.ix[1], t.ix[0]
t
Out[96]:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
1 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
2 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
另一个in place方法为所需的排序设置DataFrame索引,以便例如第3行获得索引0等,然后DataFrame就地排序。它被封装在下面的函数中,该函数假设行被索引为正整数m的某个范围(m),并且DataFrame被简单地索引(没有MultiIndex),如问题中提供的示例所示。
def putfirst(n,df):
if not isinstance(n, int):
print 'error: 1st arg must be an int'
return
if n < 1:
print 'error: 1st arg must be an int > 0'
return
if n == 1:
print 'nothing to do when first arg == 1'
return
if n > len(df):
print 'error: n exceeds the number of rows in the DataFrame'
return
df.index = range(1,n) + [0] + range(n,df.index[-1]+1)
df.sort(inplace=True)
putfirst的参数是n,它是要重新定位到第一行位置的行的序号位置,因此如果要重新定位第3行,则n = 3;和df是包含要重定位的行的DataFrame。
这是一个演示:
import pandas as pd
df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
df.set_index("a") # ineffective without assignment or inplace=True
Out[182]:
b c d e
a
1.394072 -1.076742 -0.192466 -0.871188 0.420852
-1.211411 -0.258867 -0.581647 -1.260421 0.464575
-1.070241 0.804223 -0.156736 2.010390 -0.887104
-0.977936 -0.267217 0.483338 -0.400333 0.449880
0.399594 -0.151575 -2.557934 0.160807 0.076525
-0.297204 -1.294274 -0.885180 -0.187497 -0.493560
-0.115413 -0.350745 0.044697 -0.897756 0.890874
-1.151185 -2.612303 1.141250 -0.867136 0.383583
-0.437030 0.347489 -1.230179 0.571078 0.060061
-0.225524 1.349726 1.350300 -0.386653 0.865990
df
Out[183]:
a b c d e
0 1.394072 -1.076742 -0.192466 -0.871188 0.420852
1 -1.211411 -0.258867 -0.581647 -1.260421 0.464575
2 -1.070241 0.804223 -0.156736 2.010390 -0.887104
3 -0.977936 -0.267217 0.483338 -0.400333 0.449880
4 0.399594 -0.151575 -2.557934 0.160807 0.076525
5 -0.297204 -1.294274 -0.885180 -0.187497 -0.493560
6 -0.115413 -0.350745 0.044697 -0.897756 0.890874
7 -1.151185 -2.612303 1.141250 -0.867136 0.383583
8 -0.437030 0.347489 -1.230179 0.571078 0.060061
9 -0.225524 1.349726 1.350300 -0.386653 0.865990
df.index
Out[184]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')
putfirst(3,df)
df
Out[186]:
a b c d e
0 -1.070241 0.804223 -0.156736 2.010390 -0.887104
1 1.394072 -1.076742 -0.192466 -0.871188 0.420852
2 -1.211411 -0.258867 -0.581647 -1.260421 0.464575
3 -0.977936 -0.267217 0.483338 -0.400333 0.449880
4 0.399594 -0.151575 -2.557934 0.160807 0.076525
5 -0.297204 -1.294274 -0.885180 -0.187497 -0.493560
6 -0.115413 -0.350745 0.044697 -0.897756 0.890874
7 -1.151185 -2.612303 1.141250 -0.867136 0.383583
8 -0.437030 0.347489 -1.230179 0.571078 0.060061
9 -0.225524 1.349726 1.350300 -0.386653 0.865990
答案 1 :(得分:2)
这不优雅,但到目前为止有效:
>>> df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
>>> df
a b c d e
0 1.124763 -0.416770 1.347839 -0.944334 0.738686
1 -0.348112 0.786822 -1.161970 -1.645065 -0.075205
2 0.549966 0.357076 -0.880669 -0.187731 -0.221997
3 0.311057 -0.126432 -1.187644 2.151804 0.791835
4 -0.310849 0.753750 -1.087447 0.095884 1.449832
5 -0.272344 0.278788 -0.724369 -0.568442 0.164909
6 0.942927 -0.273203 0.203322 1.099572 -0.505160
7 0.526321 1.665012 0.915676 -1.174497 -2.270662
8 -0.959773 0.921732 1.396364 -1.383112 0.603030
9 -2.802902 -0.572469 -1.599550 -1.305605 0.578198
>>> row = df.ix[0].copy()
>>> row
a 1.124763
b -0.416770
c 1.347839
d -0.944334
e 0.738686
Name: 0, dtype: float64
>>> df.ix[0]=df.ix[2]
>>> df.ix[2]=row
>>> df
a b c d e
0 0.549966 0.357076 -0.880669 -0.187731 -0.221997
1 -0.348112 0.786822 -1.161970 -1.645065 -0.075205
2 1.124763 -0.416770 1.347839 -0.944334 0.738686
3 0.311057 -0.126432 -1.187644 2.151804 0.791835
4 -0.310849 0.753750 -1.087447 0.095884 1.449832
5 -0.272344 0.278788 -0.724369 -0.568442 0.164909
6 0.942927 -0.273203 0.203322 1.099572 -0.505160
7 0.526321 1.665012 0.915676 -1.174497 -2.270662
8 -0.959773 0.921732 1.396364 -1.383112 0.603030
9 -2.802902 -0.572469 -1.599550 -1.305605 0.578198
>>> df.set_index('a')
b c d e
a
0.549966 0.357076 -0.880669 -0.187731 -0.221997
-0.348112 0.786822 -1.161970 -1.645065 -0.075205
1.124763 -0.416770 1.347839 -0.944334 0.738686
0.311057 -0.126432 -1.187644 2.151804 0.791835
-0.310849 0.753750 -1.087447 0.095884 1.449832
-0.272344 0.278788 -0.724369 -0.568442 0.164909
0.942927 -0.273203 0.203322 1.099572 -0.505160
0.526321 1.665012 0.915676 -1.174497 -2.270662
-0.959773 0.921732 1.396364 -1.383112 0.603030
-2.802902 -0.572469 -1.599550 -1.305605 0.578198
如果这就是你想要的......
答案 2 :(得分:2)
df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
您可以简单地执行以下操作
df.reindex([2, 0 ,1] + range(3, len(df)))
或者您可以执行以下操作
pd.concat([ df.reindex([2, 0, 1]) , df.iloc[3:]])
# this line rearrange the first 3 rows
df.reindex([2, 0, 1])
# slice data from third row
df.iloc[3:]
# concatenate both results together
pd.concat([ df.reindex([2, 0 ,1]), df.iloc[3:]])