我有一个pandas数据框,其日期列包含带有法语月份缩写的日期,例如:
u'18-oct.-2015'
u'12-nov.-2015'
u'02-d\xe9c.-2015'
u'26-janv.-2016'
u'02-f\xe9vr.-2016'
u'31-mai-2016'
u'01-juin-2016'
使用to_datetime
解析它们的正确方法是什么?
答案 0 :(得分:0)
我怀疑你可以设置你的语言环境:
import locale
locale.setlocale(locale.LC_ALL, 'fr_FR') # Windows may be a different locale name
# do your pandas read here
您可能需要告诉Pandas该列是日期时间列...尽管您也可能需要修复列值 - jan
是janvier
的相应缩写。但是熊猫可能足够聪明来处理它。
答案 1 :(得分:0)
一种解决方案是
import pandas as pd
df = pd.DataFrame({'french datetime':[u'18-oct.-2015',u'12-nov.-2015',u'02-d\xe9c.-2015',u'26-janv.-2016',u'02-f\xe9vr.-2016',u'31-mai-2016',u'01-juin-2016']})
# make a dictionary that maps the month name in french to a number
frenc_to_eng = {u'oct.': u'10', u'nov.':u'11',u'janv.':u'1',u'd\xe9c.':u'12',u'f\xe9vr.':u'2',u'mai':u'5',u'juin':u'6'}
# make new columsn for day month and year. FOr month, map the french name to month numbers
df['day'] = df['french datetime'].apply(lambda x : x.split('-')[0])
df['month'] = df['french datetime'].apply(lambda x : x.split('-')[1]).map(frenc_to_eng)
df['year'] = df['french datetime'].apply(lambda x : x.split('-')[2])
# make date time column from year, month and day.
df['date'] = pd.to_datetime(df['year']+'-'+df['month']+'-'+df['day'],format='%Y-%m-%d', errors='ignore')
print df
结果
french datetime day month year date
0 18-oct.-2015 18 10 2015 2015-10-18
1 12-nov.-2015 12 11 2015 2015-11-12
2 02-déc.-2015 02 12 2015 2015-12-02
3 26-janv.-2016 26 1 2016 2016-01-26
4 02-févr.-2016 02 2 2016 2016-02-02
5 31-mai-2016 31 5 2016 2016-05-31
6 01-juin-2016 01 6 2016 2016-06-01