Python Pandas清洗具有多个日期的列

时间:2014-06-07 15:08:49

标签: python date pandas

我的数据框的列如下所示:

Event date
1/3/2013
11/01/2011-10/01/2012
11/01/2011-10/01/2012
11/01/2011-10/01/2012
10/01/2012 - 02/18/2013
2/12/2013
01/18/2013-01/23/2013
11/01/2012-01/19/2013

有没有一种方法可以将日期分成两列,例如

df['Start date']
df['end date']

默认情况下,具有单个日期的行是开始日期。

2 个答案:

答案 0 :(得分:2)

你也可以在这里使用Series.str.extract()一次性完成这一切:

In [22]: df
Out[22]:
                event_date
0                 1/3/2013
1    11/01/2011-10/01/2012
2    11/01/2011-10/01/2012
3    11/01/2011-10/01/2012
4  10/01/2012 - 02/18/2013
5                2/12/2013
6    01/18/2013-01/23/2013
7    11/01/2012-01/19/2013

In [23]: df.event_date.str.extract(r'(?P<all>(?P<start>\d{1,2}/\d{1,2}/\d{4})\s*-?\s*(?P<end>\d{1,2}/\d{1,2}/\d{4})?)')
Out[23]:
                       all       start         end
0                 1/3/2013    1/3/2013         NaN
1    11/01/2011-10/01/2012  11/01/2011  10/01/2012
2    11/01/2011-10/01/2012  11/01/2011  10/01/2012
3    11/01/2011-10/01/2012  11/01/2011  10/01/2012
4  10/01/2012 - 02/18/2013  10/01/2012  02/18/2013
5                2/12/2013   2/12/2013         NaN
6    01/18/2013-01/23/2013  01/18/2013  01/23/2013
7    11/01/2012-01/19/2013  11/01/2012  01/19/2013

答案 1 :(得分:1)

您可以使用矢量化字符串split执行以下操作:

>>> df

                event_date  x
0                 1/3/2013  1
1    11/01/2011-10/01/2012  1
2    11/01/2011-10/01/2012  1
3    11/01/2011-10/01/2012  1
4  10/01/2012 - 02/18/2013  1
5                2/12/2013  1
6    01/18/2013-01/23/2013  1
7    11/01/2012-01/19/2013  1


>>> df['beg'] = df['event_date'].str.split('\s*-\s*').str[0]
>>> df['end'] = df['event_date'].str.split('\s*-\s*').str[1]
>>> df

                event_date  x         beg         end
0                 1/3/2013  1    1/3/2013         NaN
1    11/01/2011-10/01/2012  1  11/01/2011  10/01/2012
2    11/01/2011-10/01/2012  1  11/01/2011  10/01/2012
3    11/01/2011-10/01/2012  1  11/01/2011  10/01/2012
4  10/01/2012 - 02/18/2013  1  10/01/2012  02/18/2013
5                2/12/2013  1   2/12/2013         NaN
6    01/18/2013-01/23/2013  1  01/18/2013  01/23/2013
7    11/01/2012-01/19/2013  1  11/01/2012  01/19/2013

修改正如@DSM指出的那样,您还可以执行以下操作:

>>> pd.DataFrame(df['event_date'].str.split('\s*-\s*').tolist(),
                  columns=['beg','end'])

          beg         end
0    1/3/2013        None
1  11/01/2011  10/01/2012
2  11/01/2011  10/01/2012
3  11/01/2011  10/01/2012
4  10/01/2012  02/18/2013
5   2/12/2013        None
6  01/18/2013  01/23/2013
7  11/01/2012  01/19/2013