使用DataFrame.sort_index(axis = 1)时出现意外顺序。第一列最后列出

时间:2018-08-31 14:21:59

标签: python pandas dataframe

我有一个这样填充的示例DataFrame:

               Alpha      Beta     Gamma     Delta   Epsilon
Date
2017-01-02  0.854046       NaN  0.681606  0.883779  0.680304
2017-01-01  0.573784  0.407917  0.446668  0.463504  0.136830
2017-01-03  0.556100  0.849009  0.389748       NaN  0.777201

当我按axis = 1(列标题)排序时,“ Alpha”列位于最后一个位置:

df_sorted = df1.sort_index(axis=1)
print df_sorted

输出:

                Beta     Delta   Epsilon     Gamma     Alpha
Date
2017-01-02       NaN  0.883779  0.680304  0.681606  0.854046
2017-01-01  0.407917  0.463504  0.136830  0.446668  0.573784
2017-01-03  0.849009       NaN  0.777201  0.389748  0.556100

谁能解释列的字母数字排序?

谢谢!

2 个答案:

答案 0 :(得分:2)

使用@ rahlf23设置并进行一些修改后,也许在某些列名的前面有空格:

df = pd.DataFrame([['2017-01-02',  ' 0.854046',       np.nan,  '0.681606',  '0.883779',  '0.680304'],
                    ['2017-01-01',  '0.573784',  '0.407917',  '0.446668',  '0.463504',  '0.136830'],
                    ['2017-01-03',  '0.556100',  '0.849009',  '0.389748',       np.nan,  '0.777201']],
                    columns=['Date', ' Beta',      ' Gamma',     ' Delta',     'Alpha',   ' Epsilon']).set_index('Date')

df.sort_index(axis=1)

输出:

                 Beta     Delta   Epsilon     Gamma     Alpha
Date                                                         
2017-01-02   0.854046  0.681606  0.680304       NaN  0.883779
2017-01-01   0.573784  0.446668  0.136830  0.407917  0.463504
2017-01-03   0.556100  0.389748  0.777201  0.849009       NaN

要确定这一点,您可以使用df.to_dict()

{' Beta': {'2017-01-01': '0.573784',
  '2017-01-02': ' 0.854046',
  '2017-01-03': '0.556100'},
 ' Delta': {'2017-01-01': '0.446668',
  '2017-01-02': '0.681606',
  '2017-01-03': '0.389748'},
 ' Epsilon': {'2017-01-01': '0.136830',
  '2017-01-02': '0.680304',
  '2017-01-03': '0.777201'},
 ' Gamma': {'2017-01-01': '0.407917',
  '2017-01-02': nan,
  '2017-01-03': '0.849009'},
 'Alpha': {'2017-01-01': '0.463504',
  '2017-01-02': '0.883779',
  '2017-01-03': nan}}

要修复,可以使用.str.strip()

df.columns = df.columns.str.strip()
df.sort_index(axis=1)

输出:

               Alpha       Beta     Delta   Epsilon     Gamma
Date                                                         
2017-01-02  0.883779   0.854046  0.681606  0.680304       NaN
2017-01-01  0.463504   0.573784  0.446668  0.136830  0.407917
2017-01-03       NaN   0.556100  0.389748  0.777201  0.849009

答案 1 :(得分:0)

我似乎无法重现您的问题(Python 3.6.2,pandas 0.23.1),请参见下文:

df = pd.DataFrame([['2017-01-02',  ' 0.854046',       np.nan,  '0.681606',  '0.883779',  '0.680304'],
                    ['2017-01-01',  '0.573784',  '0.407917',  '0.446668',  '0.463504',  '0.136830'],
                    ['2017-01-03',  '0.556100',  '0.849009',  '0.389748',       np.nan,  '0.777201']],
                    columns=['Date', 'Alpha',      'Beta',     'Gamma',     'Delta',   'Epsilon']).set_index('Date')

礼物:

                Alpha      Beta     Gamma     Delta   Epsilon
Date                                                         
2017-01-02   0.854046       NaN  0.681606  0.883779  0.680304
2017-01-01   0.573784  0.407917  0.446668  0.463504  0.136830
2017-01-03   0.556100  0.849009  0.389748       NaN  0.777201

排序索引:

df.sort_index()

礼物:

                Alpha      Beta     Gamma     Delta   Epsilon
Date                                                         
2017-01-01   0.573784  0.407917  0.446668  0.463504  0.136830
2017-01-02   0.854046       NaN  0.681606  0.883779  0.680304
2017-01-03   0.556100  0.849009  0.389748       NaN  0.777201

对列进行排序:

df.sort_index(axis=1)

礼物:

                Alpha      Beta     Delta   Epsilon     Gamma
Date                                                         
2017-01-02   0.854046       NaN  0.883779  0.680304  0.681606
2017-01-01   0.573784  0.407917  0.463504  0.136830  0.446668
2017-01-03   0.556100  0.849009       NaN  0.777201  0.389748