我从API获取DataFrame,但索引不是0。 我想从0开始索引它,所以我尝试了.reindex()。 但它只是NA之前的项目,有用的线仍然来自它来自... 如何在pandas DataFrame中为0中的有用项编制索引?
In [29]: a = ts.get_k_data('399300', index=True,start='2015-05-01', end='2015-05
...: -31')
In [30]: a
Out[30]:
date open close high low volume code
78 2015-05-04 4757.64 4787.74 4795.92 4699.40 377843853.0 sz399300
79 2015-05-05 4785.19 4596.84 4785.19 4572.98 460419626.0 sz399300
80 2015-05-06 4626.23 4553.33 4700.91 4511.76 376073702.0 sz399300
81 2015-05-07 4520.82 4470.09 4546.34 4467.46 297759203.0 sz399300
In [31]: b = a.reindex(list(range(0,80)))
In [32]: b
Out[32]:
date open close high low volume code
0 NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN
.. ... ... ... ... ... ... ...
76 NaN NaN NaN NaN NaN NaN NaN
77 NaN NaN NaN NaN NaN NaN NaN
78 2015-05-04 4757.64 4787.74 4795.92 4699.40 377843853.0 sz399300
79 2015-05-05 4785.19 4596.84 4785.19 4572.98 460419626.0 sz399300
[80 rows x 7 columns]
答案 0 :(得分:2)
只做df.index = pd.RangeIndex(0, df.shape[0])
直接覆盖索引,reindex
它有效地保留现有索引,并返回与传入的行值对应的行,因为它们不存在,显示NaN
示例:
In[92]:
df = pd.DataFrame(np.random.randn(5,3), columns = list('abc'), index=[3,4,5,10,50])
df
Out[92]:
a b c
3 -0.185420 0.230181 1.561401
4 -0.142055 -1.130427 -1.209588
5 2.590563 0.367157 1.878946
10 0.317735 -1.578927 0.555270
50 1.424068 0.667701 0.619741
In[93]:
df.index = pd.RangeIndex(0,df.shape[0])
df
Out[93]:
a b c
0 -0.185420 0.230181 1.561401
1 -0.142055 -1.130427 -1.209588
2 2.590563 0.367157 1.878946
3 0.317735 -1.578927 0.555270
4 1.424068 0.667701 0.619741
这里RangeIndex
是单调整数指数的优化索引对象,你也可以df.index = np.arange(0, df.shape[0])
但是这会创建一个全范围的np数组,无论如何都会被转换为RangeIndex
。使用RangeIndex
只需要知道开始,停止范围和步骤,以便内存效率
或者只需致电reset_index(drop=True)
:
In[94]:
df = df.reset_index(drop=True)
df
Out[94]:
a b c
0 -0.185420 0.230181 1.561401
1 -0.142055 -1.130427 -1.209588
2 2.590563 0.367157 1.878946
3 0.317735 -1.578927 0.555270
4 1.424068 0.667701 0.619741
这将基本上删除当前索引并使用从0开始的索引重新编制它