Python Pandas:如何将一行移动到Dataframe的第一行?

时间:2015-09-13 07:44:10

标签: python numpy pandas dataframe

给定索引的现有Dataframe。

>>> df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
>>> df
          a         b         c         d         e
0 -0.131666 -0.315019  0.306728 -0.642224 -0.294562
1  0.769310 -1.277065  0.735549 -0.900214 -1.826320
2 -1.561325 -0.155571  0.544697  0.275880 -0.451564
3  0.612561 -0.540457  2.390871 -2.699741  0.534807
4 -1.504476 -2.113726  0.785208 -1.037256 -0.292959
5  0.467429  1.327839 -1.666649  1.144189  0.322896
6 -0.306556  1.668364  0.036508  0.596452  0.066755
7 -1.689779  1.469891 -0.068087 -1.113231  0.382235
8  0.028250 -2.145618  0.555973 -0.473131 -0.638056
9  0.633408 -0.791857  0.933033  1.485575 -0.021429
>>> df.set_index("a")
                  b         c         d         e
a                                                
-0.131666 -0.315019  0.306728 -0.642224 -0.294562
 0.769310 -1.277065  0.735549 -0.900214 -1.826320
-1.561325 -0.155571  0.544697  0.275880 -0.451564
 0.612561 -0.540457  2.390871 -2.699741  0.534807
-1.504476 -2.113726  0.785208 -1.037256 -0.292959
 0.467429  1.327839 -1.666649  1.144189  0.322896
-0.306556  1.668364  0.036508  0.596452  0.066755
-1.689779  1.469891 -0.068087 -1.113231  0.382235
 0.028250 -2.145618  0.555973 -0.473131 -0.638056
 0.633408 -0.791857  0.933033  1.485575 -0.021429

如何将第3行移动到第一行?

那就是预期的结果:

                  b         c         d         e
a                                                
-1.561325 -0.155571  0.544697  0.275880 -0.451564
-0.131666 -0.315019  0.306728 -0.642224 -0.294562
 0.769310 -1.277065  0.735549 -0.900214 -1.826320
 0.612561 -0.540457  2.390871 -2.699741  0.534807
-1.504476 -2.113726  0.785208 -1.037256 -0.292959
 0.467429  1.327839 -1.666649  1.144189  0.322896
-0.306556  1.668364  0.036508  0.596452  0.066755
-1.689779  1.469891 -0.068087 -1.113231  0.382235
 0.028250 -2.145618  0.555973 -0.473131 -0.638056
 0.633408 -0.791857  0.933033  1.485575 -0.021429

现在原来的第一行应该成为第二行。

3 个答案:

答案 0 :(得分:5)

重新索引可能是以1个明显步骤将行放入任何新顺序的最佳解决方案,除非它可能需要生成一个可能非常大的新DataFrame。

例如

import pandas as pd

t = pd.read_csv('table.txt',sep='\s+')
t
Out[81]: 
  DG/VD   TYPE State Access Consist Cache sCC   Size Units   Name
0   0/0  RAID1  Optl     RW      No  RWTD   -  1.818    TB    one
1   1/1  RAID1  Optl     RW      No  RWTD   -  1.818    TB    two
2   2/2  RAID1  Optl     RW      No  RWTD   -  1.818    TB  three
3   3/3  RAID1  Optl     RW      No  RWTD   -  1.818    TB   four

t.index
Out[82]: Int64Index([0, 1, 2, 3], dtype='int64')

t2 = t.reindex([2,0,1,3]) # cannot do this in place
t2
Out[93]: 
  DG/VD   TYPE State Access Consist Cache sCC   Size Units   Name
2   2/2  RAID1  Optl     RW      No  RWTD   -  1.818    TB  three
0   0/0  RAID1  Optl     RW      No  RWTD   -  1.818    TB    one
1   1/1  RAID1  Optl     RW      No  RWTD   -  1.818    TB    two
3   3/3  RAID1  Optl     RW      No  RWTD   -  1.818    TB   four

现在可以将索引设置回范围(4)而无需重新索引:

t2.index=range(4)
Out[102]: 
  DG/VD   TYPE State Access Consist Cache sCC   Size Units   Name
0   2/2  RAID1  Optl     RW      No  RWTD   -  1.818    TB  three
1   0/0  RAID1  Optl     RW      No  RWTD   -  1.818    TB    one
2   1/1  RAID1  Optl     RW      No  RWTD   -  1.818    TB    two
3   3/3  RAID1  Optl     RW      No  RWTD   -  1.818    TB   four

也可以使用' tuple切换'和行选择作为基本机制,而不创建新的DataFrame。例如:

import pandas as pd

t = pd.read_csv('table.txt',sep='\s+')

t.ix[1], t.ix[2] = t.ix[2], t.ix[1]
t.ix[0], t.ix[1] = t.ix[1], t.ix[0]  
t
Out[96]: 
  DG/VD   TYPE State Access Consist Cache sCC   Size Units   Name
0   2/2  RAID1  Optl     RW      No  RWTD   -  1.818    TB  three
1   0/0  RAID1  Optl     RW      No  RWTD   -  1.818    TB    one
2   1/1  RAID1  Optl     RW      No  RWTD   -  1.818    TB    two
3   3/3  RAID1  Optl     RW      No  RWTD   -  1.818    TB   four

另一个in place方法为所需的排序设置DataFrame索引,以便例如第3行获得索引0等,然后DataFrame就地排序。它被封装在下面的函数中,该函数假设行被索引为正整数m的某个范围(m),并且DataFrame被简单地索引(没有MultiIndex),如问题中提供的示例所示。

def putfirst(n,df):
    if not isinstance(n, int):
        print 'error: 1st arg must be an int'
        return
    if n < 1:
        print 'error: 1st arg must be an int > 0'
        return
    if n == 1:
       print 'nothing to do when first arg == 1'
       return
    if n > len(df):
       print 'error: n exceeds the number of rows in the DataFrame'
       return
    df.index = range(1,n) + [0] + range(n,df.index[-1]+1)
    df.sort(inplace=True)

putfirst的参数是n,它是要重新定位到第一行位置的行的序号位置,因此如果要重新定位第3行,则n = 3;和df是包含要重定位的行的DataFrame。

这是一个演示:

import pandas as pd

df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])

df.set_index("a") # ineffective without assignment or inplace=True
Out[182]: 
                  b         c         d         e
a                                                
 1.394072 -1.076742 -0.192466 -0.871188  0.420852
-1.211411 -0.258867 -0.581647 -1.260421  0.464575
-1.070241  0.804223 -0.156736  2.010390 -0.887104
-0.977936 -0.267217  0.483338 -0.400333  0.449880
 0.399594 -0.151575 -2.557934  0.160807  0.076525
-0.297204 -1.294274 -0.885180 -0.187497 -0.493560
-0.115413 -0.350745  0.044697 -0.897756  0.890874
-1.151185 -2.612303  1.141250 -0.867136  0.383583
-0.437030  0.347489 -1.230179  0.571078  0.060061
-0.225524  1.349726  1.350300 -0.386653  0.865990

df
Out[183]: 
          a         b         c         d         e
0  1.394072 -1.076742 -0.192466 -0.871188  0.420852
1 -1.211411 -0.258867 -0.581647 -1.260421  0.464575
2 -1.070241  0.804223 -0.156736  2.010390 -0.887104
3 -0.977936 -0.267217  0.483338 -0.400333  0.449880
4  0.399594 -0.151575 -2.557934  0.160807  0.076525
5 -0.297204 -1.294274 -0.885180 -0.187497 -0.493560
6 -0.115413 -0.350745  0.044697 -0.897756  0.890874
7 -1.151185 -2.612303  1.141250 -0.867136  0.383583
8 -0.437030  0.347489 -1.230179  0.571078  0.060061
9 -0.225524  1.349726  1.350300 -0.386653  0.865990

df.index
Out[184]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')

putfirst(3,df)
df
Out[186]: 
          a         b         c         d         e
0 -1.070241  0.804223 -0.156736  2.010390 -0.887104
1  1.394072 -1.076742 -0.192466 -0.871188  0.420852
2 -1.211411 -0.258867 -0.581647 -1.260421  0.464575
3 -0.977936 -0.267217  0.483338 -0.400333  0.449880
4  0.399594 -0.151575 -2.557934  0.160807  0.076525
5 -0.297204 -1.294274 -0.885180 -0.187497 -0.493560
6 -0.115413 -0.350745  0.044697 -0.897756  0.890874
7 -1.151185 -2.612303  1.141250 -0.867136  0.383583
8 -0.437030  0.347489 -1.230179  0.571078  0.060061
9 -0.225524  1.349726  1.350300 -0.386653  0.865990

答案 1 :(得分:2)

这不优雅,但到目前为止有效:

>>> df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
>>> df
      a         b         c         d         e
0  1.124763 -0.416770  1.347839 -0.944334  0.738686
1 -0.348112  0.786822 -1.161970 -1.645065 -0.075205
2  0.549966  0.357076 -0.880669 -0.187731 -0.221997
3  0.311057 -0.126432 -1.187644  2.151804  0.791835
4 -0.310849  0.753750 -1.087447  0.095884  1.449832
5 -0.272344  0.278788 -0.724369 -0.568442  0.164909
6  0.942927 -0.273203  0.203322  1.099572 -0.505160
7  0.526321  1.665012  0.915676 -1.174497 -2.270662
8 -0.959773  0.921732  1.396364 -1.383112  0.603030
9 -2.802902 -0.572469 -1.599550 -1.305605  0.578198
>>> row = df.ix[0].copy()
>>> row
a    1.124763
b   -0.416770
c    1.347839
d   -0.944334
e    0.738686
Name: 0, dtype: float64
>>> df.ix[0]=df.ix[2]
>>> df.ix[2]=row
>>> df
          a         b         c         d         e
0  0.549966  0.357076 -0.880669 -0.187731 -0.221997
1 -0.348112  0.786822 -1.161970 -1.645065 -0.075205
2  1.124763 -0.416770  1.347839 -0.944334  0.738686
3  0.311057 -0.126432 -1.187644  2.151804  0.791835
4 -0.310849  0.753750 -1.087447  0.095884  1.449832
5 -0.272344  0.278788 -0.724369 -0.568442  0.164909
6  0.942927 -0.273203  0.203322  1.099572 -0.505160
7  0.526321  1.665012  0.915676 -1.174497 -2.270662
8 -0.959773  0.921732  1.396364 -1.383112  0.603030
9 -2.802902 -0.572469 -1.599550 -1.305605  0.578198
>>> df.set_index('a')
                  b         c         d         e
a                                                
 0.549966  0.357076 -0.880669 -0.187731 -0.221997
-0.348112  0.786822 -1.161970 -1.645065 -0.075205
 1.124763 -0.416770  1.347839 -0.944334  0.738686
 0.311057 -0.126432 -1.187644  2.151804  0.791835
-0.310849  0.753750 -1.087447  0.095884  1.449832
-0.272344  0.278788 -0.724369 -0.568442  0.164909
 0.942927 -0.273203  0.203322  1.099572 -0.505160
 0.526321  1.665012  0.915676 -1.174497 -2.270662
-0.959773  0.921732  1.396364 -1.383112  0.603030
-2.802902 -0.572469 -1.599550 -1.305605  0.578198

如果这就是你想要的......

答案 2 :(得分:2)

df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])

您可以简单地执行以下操作

df.reindex([2, 0 ,1] + range(3, len(df)))

或者您可以执行以下操作

pd.concat([ df.reindex([2, 0, 1]) , df.iloc[3:]])

# this line rearrange the first 3 rows
df.reindex([2, 0, 1])

# slice data from third row 
df.iloc[3:]

# concatenate both results together
pd.concat([ df.reindex([2, 0 ,1]), df.iloc[3:]])