我的数据框的列如下所示:
Event date
1/3/2013
11/01/2011-10/01/2012
11/01/2011-10/01/2012
11/01/2011-10/01/2012
10/01/2012 - 02/18/2013
2/12/2013
01/18/2013-01/23/2013
11/01/2012-01/19/2013
有没有一种方法可以将日期分成两列,例如
df['Start date']
df['end date']
默认情况下,具有单个日期的行是开始日期。
答案 0 :(得分:2)
你也可以在这里使用Series.str.extract()
一次性完成这一切:
In [22]: df
Out[22]:
event_date
0 1/3/2013
1 11/01/2011-10/01/2012
2 11/01/2011-10/01/2012
3 11/01/2011-10/01/2012
4 10/01/2012 - 02/18/2013
5 2/12/2013
6 01/18/2013-01/23/2013
7 11/01/2012-01/19/2013
In [23]: df.event_date.str.extract(r'(?P<all>(?P<start>\d{1,2}/\d{1,2}/\d{4})\s*-?\s*(?P<end>\d{1,2}/\d{1,2}/\d{4})?)')
Out[23]:
all start end
0 1/3/2013 1/3/2013 NaN
1 11/01/2011-10/01/2012 11/01/2011 10/01/2012
2 11/01/2011-10/01/2012 11/01/2011 10/01/2012
3 11/01/2011-10/01/2012 11/01/2011 10/01/2012
4 10/01/2012 - 02/18/2013 10/01/2012 02/18/2013
5 2/12/2013 2/12/2013 NaN
6 01/18/2013-01/23/2013 01/18/2013 01/23/2013
7 11/01/2012-01/19/2013 11/01/2012 01/19/2013
答案 1 :(得分:1)
您可以使用矢量化字符串split
执行以下操作:
>>> df
event_date x
0 1/3/2013 1
1 11/01/2011-10/01/2012 1
2 11/01/2011-10/01/2012 1
3 11/01/2011-10/01/2012 1
4 10/01/2012 - 02/18/2013 1
5 2/12/2013 1
6 01/18/2013-01/23/2013 1
7 11/01/2012-01/19/2013 1
>>> df['beg'] = df['event_date'].str.split('\s*-\s*').str[0]
>>> df['end'] = df['event_date'].str.split('\s*-\s*').str[1]
>>> df
event_date x beg end
0 1/3/2013 1 1/3/2013 NaN
1 11/01/2011-10/01/2012 1 11/01/2011 10/01/2012
2 11/01/2011-10/01/2012 1 11/01/2011 10/01/2012
3 11/01/2011-10/01/2012 1 11/01/2011 10/01/2012
4 10/01/2012 - 02/18/2013 1 10/01/2012 02/18/2013
5 2/12/2013 1 2/12/2013 NaN
6 01/18/2013-01/23/2013 1 01/18/2013 01/23/2013
7 11/01/2012-01/19/2013 1 11/01/2012 01/19/2013
修改正如@DSM指出的那样,您还可以执行以下操作:
>>> pd.DataFrame(df['event_date'].str.split('\s*-\s*').tolist(),
columns=['beg','end'])
beg end
0 1/3/2013 None
1 11/01/2011 10/01/2012
2 11/01/2011 10/01/2012
3 11/01/2011 10/01/2012
4 10/01/2012 02/18/2013
5 2/12/2013 None
6 01/18/2013 01/23/2013
7 11/01/2012 01/19/2013