我在pandas数据框中删除列后如何重新标记列?

时间:2017-08-08 20:28:29

标签: python-3.x pandas dataframe

我有一个pandas数据帧,表示如下: -

  data=pd.read_csv("training-set-org.csv",sep=',', header = None)

打印时的输出如下: -

 print(data.head())

           0        1          2          3  4     5      6       7  \
           0  22.896448  33.1366  18.738063  26.846212  6  4242  50257  131962   
           1  22.896448  33.1366  18.738063  26.846212  6  4242  50257  68719   
           2  22.896448  33.1366  18.738063  26.846212  6  4242  50257  171647   
           3  22.896448  33.1366  18.738063  26.846212  6  4242  50257  246620   
           4  22.896448  33.1366  18.738063  26.846212  6  4242  50257   64072   

现在我放下第4列

  data.drop(data.columns[4],axis=1,inplace=True)

据我所知,data.columns [4]引用标记为4的列,这是正确的。

现在,当我打印数据帧时,我得到: -

  printing data:            0        1          2          3     5      6       7           
        0  22.896448  33.1366  18.738063  26.846212  4242  50257  131962  
        1  22.896448  33.1366  18.738063  26.846212  4242  50257   68719  
        2  22.896448  33.1366  18.738063  26.846212  4242  50257  171647  
        3  22.896448  33.1366  18.738063  26.846212  4242  50257  246620  
        4  22.896448  33.1366  18.738063  26.846212  4242  50257   64072  

如您所见,标签4缺失。

如何重新标记数据框,使每个列标签向左移动,以便列标记为0,1,2,3,4..6而不是7。 我希望使用数量较少的数据帧数据,并在循环中使用data.iloc [:,i]处理列。 我该怎么做呢?。我仍处于python的初期阶段。所以任何帮助都表示赞赏..

3 个答案:

答案 0 :(得分:1)

您可以指定由RangeIndex创建的默认列:

data.columns = pd.RangeIndex(len(data.columns))    
print (data)
           0        1          2          3     4      5       6
0  22.896448  33.1366  18.738063  26.846212  4242  50257  131962
1  22.896448  33.1366  18.738063  26.846212  4242  50257   68719
2  22.896448  33.1366  18.738063  26.846212  4242  50257  171647
3  22.896448  33.1366  18.738063  26.846212  4242  50257  246620
4  22.896448  33.1366  18.738063  26.846212  4242  50257   64072

或使用range

data.columns = range(len(data.columns))    
print (data)
           0        1          2          3     4      5       6
0  22.896448  33.1366  18.738063  26.846212  4242  50257  131962
1  22.896448  33.1366  18.738063  26.846212  4242  50257   68719
2  22.896448  33.1366  18.738063  26.846212  4242  50257  171647
3  22.896448  33.1366  18.738063  26.846212  4242  50257  246620
4  22.896448  33.1366  18.738063  26.846212  4242  50257   64072

计时:仅限有趣的事情:)

In [126]: %timeit data.columns = range(len(data.columns))
The slowest run took 4.70 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 23.4 µs per loop

In [127]: %timeit data.columns = pd.RangeIndex(len(data.columns))
The slowest run took 4.61 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 14.4 µs per loop

In [128]: %timeit data.columns = np.arange(len(data.columns))
The slowest run took 8.52 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 45.2 µs per loop

答案 1 :(得分:0)

如果您的列标签只是整数,则可以使用以下代码:

import numpy as np
data.columns = np.arange(len(data.columns))

答案 2 :(得分:0)

很简单,试试看:

data.columns = range(7)