DataFrame排序的意外结果

时间:2013-04-12 16:05:06

标签: sorting dataframe pandas

当我使用多个列(['Symbol','Year','Month','Day'])对DataFrame进行排序时,生成的DataFrame按Symbol > Year > Month排序,排序Day

In [1]: df = pd.DataFrame({'Symbol': {79: 'F', 81: 'F', 82: 'F', 83: 'F', 84: 'F', 85: 'F', 86: 'F', 87: 'F', 89: 'F'}, 'Shares': {79: 100, 81: 100, 82: 100, 83: 100, 84: 100, 85: 100, 86: 100, 87: 100, 89: 100}, 'Month': {79: '08', 81: '08', 82: '08', 83: '08', 84: '08', 85: '08', 86: '08', 87: '08', 89: '09'}, 'Year': {79: '2008', 81: '2008', 82: '2008', 83: '2008', 84: '2008', 85: '2008', 86: '2008', 87: '2008', 89: '2008'}, 'Action': {79: 'Sell', 81: 'Sell', 82: 'Buy', 83: 'Sell', 84: 'Buy', 85: 'Sell', 86: 'Buy', 87: 'Sell', 89: 'Sell'}, 'Day': {79: 2L, 81: 4L, 82: '06', 83: 11L, 84: '13', 85: 18L, 86: '18', 87: 23L, 89: 22L}})

In [2]: df
Out[2]:
   Action Day Month  Shares Symbol  Year
79   Sell   2    08     100      F  2008
81   Sell   4    08     100      F  2008
82    Buy  06    08     100      F  2008
83   Sell  11    08     100      F  2008
84    Buy  13    08     100      F  2008
85   Sell  18    08     100      F  2008
86    Buy  18    08     100      F  2008
87   Sell  23    08     100      F  2008
89   Sell  22    09     100      F  2008

In [3]: df.sort(['Symbol','Year','Month','Day'])
Out[3]:
   Action Day Month  Shares Symbol  Year
79   Sell   2    08     100      F  2008
81   Sell   4    08     100      F  2008
83   Sell  11    08     100      F  2008
85   Sell  18    08     100      F  2008
87   Sell  23    08     100      F  2008
82    Buy  06    08     100      F  2008
84    Buy  13    08     100      F  2008
86    Buy  18    08     100      F  2008
89   Sell  22    09     100      F  2008

为什么sort没有按预期工作?

1 个答案:

答案 0 :(得分:1)

它没有按预期工作,因为Days存储为混合类型(字符串和长整数),并且因为字符串在python 中“大于”数字(排序看起来像是意外的行为)

您可以按apply - int

将此列转换为整数
df['Day'] = df['Day'].apply(int)

我也会考虑在月份和年份这样做,因为在你的DataFrame中这些是字符串(并且可能更符合int):

df['Mo.'] = df['Mo.'].apply(int)
df['Year'] = df['Year'].apply(int)

然后你可以白天sort

In [11]: df.sort(['Day'])
Out[11]:
   Indx  Year  Mo.  Day Sym Action  Shares
0    79  2008    8    2   F   Sell     100
1    81  2008    8    4   F   Sell     100
5    82  2008    8    6   F    Buy     100
2    83  2008    8   11   F   Sell     100
6    84  2008    8   13   F    Buy     100
3    85  2008    8   18   F   Sell     100
7    86  2008    8   18   F    Buy     100
8    89  2008    9   22   F   Sell     100
4    87  2008    8   23   F   Sell     100

或者使用多列排序:

In [12]: df.sort(['Mo.', 'Day'])
Out[12]:
   Indx  Year  Mo.  Day Sym Action  Shares
0    79  2008    8    2   F   Sell     100
1    81  2008    8    4   F   Sell     100
5    82  2008    8    6   F    Buy     100
2    83  2008    8   11   F   Sell     100
6    84  2008    8   13   F    Buy     100
3    85  2008    8   18   F   Sell     100
7    86  2008    8   18   F    Buy     100
4    87  2008    8   23   F   Sell     100
8    89  2008    9   22   F   Sell     100

In [13]: df.sort(['Day', 'Mo.'])
Out[13]:
   Indx  Year  Mo.  Day Sym Action  Shares
0    79  2008    8    2   F   Sell     100
1    81  2008    8    4   F   Sell     100
5    82  2008    8    6   F    Buy     100
2    83  2008    8   11   F   Sell     100
6    84  2008    8   13   F    Buy     100
3    85  2008    8   18   F   Sell     100
7    86  2008    8   18   F    Buy     100
8    89  2008    9   22   F   Sell     100
4    87  2008    8   23   F   Sell     100

使用ascending参数:

In [14]: df.sort(['Mo.', 'Day'], ascending=[True, False])
Out[14]:
   Indx  Year  Mo.  Day Sym Action  Shares
4    87  2008    8   23   F   Sell     100
3    85  2008    8   18   F   Sell     100
7    86  2008    8   18   F    Buy     100
6    84  2008    8   13   F    Buy     100
2    83  2008    8   11   F   Sell     100
5    82  2008    8    6   F    Buy     100
1    81  2008    8    4   F   Sell     100
0    79  2008    8    2   F   Sell     100
8    89  2008    9   22   F   Sell     100

...将按预期工作。