Question

是否有任何pandas.DataFrame.reset_index等效于对列进行操作并且能够处理重复列名称的情况？

显然我可以简单地为列分配新值，如果有像df.reset_index这样的方法，我想知道。

示例输入

 pd.DataFrame(np.random.rand(5, 3), columns = ['A', 'A', 'B'])

     A   A   B
0   0.5 0.3 0.9
1   0.7 0.9 0.3
2   0.9 0.4 0.8
3   0.6 0.2 0.9
4   0.7 0.4 0.6

预期输出

     0   1   2
0   0.8 0.1 0.2
1   0.4 0.2 0.4
2   0.3 0.3 0.4
3   0.4 0.1 0.8
4   1.0 0.9 0.9

其中0,1,2只是pandas默认以无名称命名列的方式。

当我有重复的列名时，df.rename或df.reindex_axis等现有方法不起作用

Answer 1

使用range列的长度shape：

df.columns = range(df.shape[1])
print (df)
          0         1         2
0  0.228080  0.884450  0.753401
1  0.176790  0.741979  0.525305
2  0.680255  0.730258  0.449681
3  0.169420  0.660825  0.986554
4  0.302204  0.040413  0.902899

使用参数drop=True T和reset_index进行双重转置的另一种解决方案：

df = df.T.reset_index(drop=True).T
print (df)
          0         1         2
0  0.024846  0.688193  0.887926
1  0.284681  0.895319  0.142876
2  0.440834  0.299527  0.762815
3  0.936967  0.928907  0.642960
4  0.801077  0.085773  0.866651

Answer 2

您可以使用set_axis()方法：

In [54]: df
Out[54]:
          A         A         B
0  0.934900  0.817182  0.166270
1  0.064543  0.139431  0.249576
2  0.709349  0.731913  0.965048
3  0.284955  0.479898  0.496652
4  0.520749  0.464256  0.999993

In [55]: df.set_axis(1, range(len(df.columns)))

In [56]: df
Out[56]:
          0         1         2
0  0.934900  0.817182  0.166270
1  0.064543  0.139431  0.249576
2  0.709349  0.731913  0.965048
3  0.284955  0.479898  0.496652
4  0.520749  0.464256  0.999993

列的pandas DataFrame reset_index？

2 个答案: