我有人格式日期范围:
dt = pd.Series(['27.02-11.03.2014', '10-11.06.2014'])
我希望得到DataFrame,其中包含当前使用的事件开始结束日期:
tmp = dt.str.split('-').apply(lambda x: pd.Series(x, index=['start', 'end'])).apply(lambda x: pd.to_datetime(x, dayfirst=True))
def dt_parse(dt):
x, y = dt
if len(x) > 2:
t = x.split('.')
r = pd.to_datetime('-'.join([t[0], t[1], str(y.year)]), dayfirst = True)
else:
r = pd.to_datetime('-'.join([x, str(y.month), str(y.year)]), dayfirst = True)
return r
tmp['start'] = tmp.apply(dt_parse, axis = 1)
并获取
start end
0 2014-02-27 2014-03-11
1 2014-06-10 2014-06-11
其他(更有效/雄辩)的想法怎么做?
BR
答案 0 :(得分:0)
您可以使用dt.str.extract
使用正则表达式选择值:
In [108]: df = dt.str.extract(r'(?P<start_day>\d+)(?:\.(?P<start_month>\d+))?-(?P<end_day>\d+)\.(?P<end_month>\d+)\.(?P<year>\d+)')
In [109]: df
Out[109]:
start_day start_month end_day end_month year
0 27 02 11 03 2014
1 10 NaN 11 06 2014
可以使用fillna
方法填充缺少的start_month值:
df['start_month'] = df['start_month'].fillna(value=df['end_month'])
然后使用combine64
函数(下面)将各个数字组合成np.datetime64值:
import numpy as np
import pandas as pd
def combine64(years, months=1, days=1, weeks=None, hours=None, minutes=None,
seconds=None, milliseconds=None, microseconds=None, nanoseconds=None):
years = np.asarray(years) - 1970
months = np.asarray(months) - 1
days = np.asarray(days) - 1
types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]',
'<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]')
vals = (years, months, days, weeks, hours, minutes, seconds,
milliseconds, microseconds, nanoseconds)
return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals)
if v is not None)
dt = pd.Series(['27.02-11.03.2014', '10-11.06.2014'])
df = dt.str.extract(r'(?P<start_day>\d+)(?:\.(?P<start_month>\d+))?-(?P<end_day>\d+)\.(?P<end_month>\d+)\.(?P<year>\d+)')
df = df.astype('float')
df['start_month'] = df['start_month'].fillna(value=df['end_month'])
df['start'] = combine64(df['year'], df['start_month'], df['start_day'])
df['end'] = combine64(df['year'], df['end_month'], df['end_day'])
df = df[['start', 'end']]
print(df)
产量
start end
0 2014-02-27 2014-03-11
1 2014-06-10 2014-06-11