拆分包含多种日期格式的列

时间:2016-06-04 06:02:36

标签: python-2.7 csv pandas

我有一个csv文件,其中包含一个具有多种日期格式的列。我需要拆分它们并以相同的格式获取提取的结果。

Wednesday 12 August 2015
Wednesday 12 August 2015
Friday April 1 2016
Friday April 1 2016
5/12/2016
5/12/2016

这是文件,我希望它以mm / dd / yy格式。我的代码如下:

import re
import csv
import pandas as pd
#delimiters = " ", "/"

#f = open('merged_34.csv')
f = open('test3.csv')
df = pd.read_csv('test3.csv')
for item in df['serverDatePrettyFirstAction']:
    if '/' in item:
       newDate.append(item)
    else:
       item = item.split(' ', 1)[1]
       newDate.append(item)
df['newDate'] = newDate
df.to_csv('D:/Python/10.36.202.64/newfile.csv', index = False)

这就是我得到的:

serverDatePrettyFirstAction newDate
Wednesday 12 August 2015    12-Aug-15
Wednesday 12 August 2015    12-Aug-15
Friday April 1 2016         April 1 2016
Friday April 1 2016         April 1 2016
5/12/2016                   5/12/2016
5/12/2016                   5/12/2016

还有一种方法可以覆盖同一列本身的值

2 个答案:

答案 0 :(得分:1)

只要您的数据不是太大,您就可以使用第三方dateutil库。(毕竟,它每次都会猜测格式)

import pandas as pd

from dateutil import parser

df = pd.read_csv('test3.csv')
df['newDate'] = df['serverDatePrettyFirstAction'].apply(parser.parse)
df.to_csv('newfile.csv', index=False, date_format='%Y-%m-%d ')
  

覆盖同一列中的值

使用
df['serverDatePrettyFirstAction']=df['serverDatePrettyFirstAction'].apply(parser.parse)

答案 1 :(得分:1)

更快的方法是使用pandas的方法to_datetime()

In [2]: df
Out[2]:
                       Date
0  Wednesday 12 August 2015
1  Wednesday 12 August 2015
2       Friday April 1 2016
3       Friday April 1 2016
4                 5/12/2016
5                 5/12/2016

In [6]: df['newDate'] = pd.to_datetime(df['Date'])

结果:

In [7]: df
Out[7]:
                       Date    newDate
0  Wednesday 12 August 2015 2015-08-12
1  Wednesday 12 August 2015 2015-08-12
2       Friday April 1 2016 2016-04-01
3       Friday April 1 2016 2016-04-01
4                 5/12/2016 2016-05-12
5                 5/12/2016 2016-05-12