如何del DataFrame无用的行,使有用的行索引从0?

时间:2017-07-06 14:39:54

标签: python pandas dataframe

我从API获取DataFrame,但索引不是0。 我想从0开始索引它,所以我尝试了.reindex()。 但它只是NA之前的项目,有用的线仍然来自它来自... 如何在pandas DataFrame中为0中的有用项编制索引?

In [29]: a = ts.get_k_data('399300', index=True,start='2015-05-01', end='2015-05
    ...: -31')

In [30]: a
Out[30]: 
          date     open    close     high      low       volume      code
78  2015-05-04  4757.64  4787.74  4795.92  4699.40  377843853.0  sz399300
79  2015-05-05  4785.19  4596.84  4785.19  4572.98  460419626.0  sz399300
80  2015-05-06  4626.23  4553.33  4700.91  4511.76  376073702.0  sz399300
81  2015-05-07  4520.82  4470.09  4546.34  4467.46  297759203.0  sz399300

In [31]: b = a.reindex(list(range(0,80)))

In [32]: b
Out[32]: 
          date     open    close     high      low       volume      code
0          NaN      NaN      NaN      NaN      NaN          NaN       NaN
1          NaN      NaN      NaN      NaN      NaN          NaN       NaN
2          NaN      NaN      NaN      NaN      NaN          NaN       NaN
..         ...      ...      ...      ...      ...          ...       ...
76         NaN      NaN      NaN      NaN      NaN          NaN       NaN
77         NaN      NaN      NaN      NaN      NaN          NaN       NaN
78  2015-05-04  4757.64  4787.74  4795.92  4699.40  377843853.0  sz399300
79  2015-05-05  4785.19  4596.84  4785.19  4572.98  460419626.0  sz399300

[80 rows x 7 columns]

1 个答案:

答案 0 :(得分:2)

只做df.index = pd.RangeIndex(0, df.shape[0])

直接覆盖索引,reindex它有效地保留现有索引,并返回与传入的行值对应的行,因为它们不存在,显示NaN

示例:

In[92]:
df = pd.DataFrame(np.random.randn(5,3), columns = list('abc'), index=[3,4,5,10,50])
df

Out[92]: 
           a         b         c
3  -0.185420  0.230181  1.561401
4  -0.142055 -1.130427 -1.209588
5   2.590563  0.367157  1.878946
10  0.317735 -1.578927  0.555270
50  1.424068  0.667701  0.619741

In[93]:
df.index = pd.RangeIndex(0,df.shape[0])
df

Out[93]: 
          a         b         c
0 -0.185420  0.230181  1.561401
1 -0.142055 -1.130427 -1.209588
2  2.590563  0.367157  1.878946
3  0.317735 -1.578927  0.555270
4  1.424068  0.667701  0.619741

这里RangeIndex是单调整数指数的优化索引对象,你也可以df.index = np.arange(0, df.shape[0])但是这会创建一个全范围的np数组,无论如何都会被转换为RangeIndex。使用RangeIndex只需要知道开始,停止范围和步骤,以便内存效率

或者只需致电reset_index(drop=True)

In[94]:
df = df.reset_index(drop=True)
df

Out[94]: 
          a         b         c
0 -0.185420  0.230181  1.561401
1 -0.142055 -1.130427 -1.209588
2  2.590563  0.367157  1.878946
3  0.317735 -1.578927  0.555270
4  1.424068  0.667701  0.619741

这将基本上删除当前索引并使用从0开始的索引重新编制它