如何按英文日期格式排序而不是美国熊猫.sort()

时间:2017-04-12 03:12:09

标签: python sorting date pandas datetime

    symb                dates
4     BLK  01/03/2014 09:00:00
0     BBR  02/06/2014 09:00:00
21     HZ  02/06/2014 09:00:00
24   OMNI  02/07/2014 09:00:00
31   NOTE  03/04/2014 09:00:00
65    AMP  03/04/2016 09:00:00
40    RBY  04/07/2014 09:00:00

以下是(df.sort('date'))输出的示例。

正如您所看到的,它使用了几个月的日子,反之亦然。知道如何解决这个问题吗?

3 个答案:

答案 0 :(得分:2)

您可以使用pandas.to_datetime并使用format参数然后对其进行排序。

>> df['date'] = pd.to_datetime(df['date'], format='%m/%d/%Y %H:%M:%S')
>> df.sort('date')

               date    symb
0 2014-01-03 09:00:00   BLK
1 2014-02-06 09:00:00   BBR
2 2014-02-06 09:00:00    HZ
3 2014-02-07 09:00:00  OMNI
4 2014-03-04 09:00:00  NOTE
6 2014-04-07 09:00:00   RBY
5 2016-03-04 09:00:00   AMP

答案 1 :(得分:0)

我不确定您是如何获取数据的,但如果您从某些来源(例如CSV)导入数据,则可以使用pandas.read_csv并设置parse_dates=True。问题是日期列的类型是什么?您可以使用`dateutil.parse.parse轻松地将它们更改为类似日期的对象。例如,

import pandas
import dateutil
data = {'symb': ['BLK', 'BBR', 'HZ', 'OMNI', 'NOTE', 'AMP', 'RBY'],
        'dates': ['01/03/2014 09:00:00', '02/06/2014 09:00:00', '02/06/2014 09:00:00',
               '02/07/2014 09:00:00', '03/04/2014 09:00:00', '03/04/2016 09:00:00',
               '04/07/2014 09:00:00']}
df = pandas.DataFrame.from_dict(data)
df.dates = df.dates.apply(dateutil.parser.parse)
print df.to_string()

# OUTPUT
# 0 2014-01-03 09:00:00   BLK
# 1 2014-02-06 09:00:00   BBR
# 2 2014-02-06 09:00:00    HZ
# 3 2014-02-07 09:00:00  OMNI
# 4 2014-03-04 09:00:00  NOTE
# 5 2016-03-04 09:00:00   AMP
# 6 2014-04-07 09:00:00   RBY

这可以获得[ISO8601格式],它可能比dd/mm/yyyy格式更可取,但是如果你必须有这种格式,你可以使用@umutto推荐的代码

答案 2 :(得分:0)

您可以使用to_datetime来排序sort_values

#format mm/dd/YYYY
df['dates'] = pd.to_datetime(df['dates'])
print (df.sort_values('dates'))
    symb               dates
4    BLK 2014-01-03 09:00:00
0    BBR 2014-02-06 09:00:00
21    HZ 2014-02-06 09:00:00
24  OMNI 2014-02-07 09:00:00
31  NOTE 2014-03-04 09:00:00
40   RBY 2014-04-07 09:00:00
65   AMP 2016-03-04 09:00:00
#format dd/mm/YYYY
df['dates'] = pd.to_datetime(df['dates'], dayfirst=True)
print (df.sort_values('dates'))
    symb               dates
4    BLK 2014-03-01 09:00:00
31  NOTE 2014-04-03 09:00:00
0    BBR 2014-06-02 09:00:00
21    HZ 2014-06-02 09:00:00
24  OMNI 2014-07-02 09:00:00
40   RBY 2014-07-04 09:00:00
65   AMP 2016-04-03 09:00:00

另一个解决方案是在read_csv中使用参数parse_dates,如果格式dd/mm/YYYY添加dayfirst=True

import pandas as pd
import numpy as np
from pandas.compat import StringIO

temp=u"""symb,dates
BLK,01/03/2014 09:00:00
BBR,02/06/2014 09:00:00
HZ,02/06/2014 09:00:00
OMNI,02/07/2014 09:00:00
NOTE,03/04/2014 09:00:00
AMP,03/04/2016 09:00:00
RBY,04/07/2014 09:00:00"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), parse_dates=['dates'])

print (df)
   symb               dates
0   BLK 2014-01-03 09:00:00
1   BBR 2014-02-06 09:00:00
2    HZ 2014-02-06 09:00:00
3  OMNI 2014-02-07 09:00:00
4  NOTE 2014-03-04 09:00:00
5   AMP 2016-03-04 09:00:00
6   RBY 2014-04-07 09:00:00

print (df.dtypes)
symb             object
dates    datetime64[ns]
dtype: object
print (df.sort_values('dates'))
   symb               dates
0   BLK 2014-01-03 09:00:00
1   BBR 2014-02-06 09:00:00
2    HZ 2014-02-06 09:00:00
3  OMNI 2014-02-07 09:00:00
4  NOTE 2014-03-04 09:00:00
6   RBY 2014-04-07 09:00:00
5   AMP 2016-03-04 09:00:00
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), parse_dates=['dates'], dayfirst=True)

print (df)
   symb               dates
0   BLK 2014-03-01 09:00:00
1   BBR 2014-06-02 09:00:00
2    HZ 2014-06-02 09:00:00
3  OMNI 2014-07-02 09:00:00
4  NOTE 2014-04-03 09:00:00
5   AMP 2016-04-03 09:00:00
6   RBY 2014-07-04 09:00:00

print (df.dtypes)
symb             object
dates    datetime64[ns]
dtype: object

print (df.sort_values('dates'))
   symb               dates
0   BLK 2014-03-01 09:00:00
4  NOTE 2014-04-03 09:00:00
1   BBR 2014-06-02 09:00:00
2    HZ 2014-06-02 09:00:00
3  OMNI 2014-07-02 09:00:00
6   RBY 2014-07-04 09:00:00
5   AMP 2016-04-03 09:00:00