考虑以下输入:
[['Fiscal data as of Dec 31 2016', '2016', '2015', '2014'],
['Fiscal data as of Mar 31 2016', '2016', '2015', '2014']]
我想要的输出是:
[[2016-12-31, 2015-12-31, 2014-12-31],
[2016-03-31, 2015-03-31, 2014-12-31]]
基本上,我想将每个1-3
nested
内的元素list
转换为datetime
对象,其中month
信息基于元素{{ 1 {} 0
。
我可以想到一个手动密集型解决方案,但我正在寻找最有效的方法(速度方面)来实现这一目标。实际数据有数千行。
答案 0 :(得分:1)
您可以months
使用extract
days
,radd
添加至每年的eache年份并转换为to_datetime
:
L = [['Fiscal data as of Dec 31 2016', '2016', '2015', '2014'],
['Fiscal data as of Mar 31 2016', '2016', '2015', '2014']]
a = np.array(L)
pat = '(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+(\d{1,2})'
d = pd.Series(a[:, 0]).str.extract(pat, expand=True).apply('-'.join, 1).add('-')
print (d)
0 Dec-31-
1 Mar-31-
dtype: object
L1 = pd.DataFrame(a[:, 1:]).radd(d, 0).apply(pd.to_datetime).values.astype('datetime64[D]')
print (L1)
[['2016-12-31' '2015-12-31' '2014-12-31']
['2016-03-31' '2015-03-31' '2014-03-31']]
如果性能很重要,请使用dictionary
来映射月份:
d = {'Jan':'01', 'Feb':'02', 'Mar':'03', 'Apr':'04', 'May':'05', 'Jun':'06',
'Jul':'07', 'Aug':'08', 'Sep':'09', 'Oct':'10', 'Nov':'11', 'Dec':'12'}
L2 = []
for l in L:
a = l[0].split()[-3:-1]
a = '-'.join([d[a[0]], a[1]])
L2.append([x + '-' + a for x in l[1:]])
print (L2)
[['2016-12-31', '2015-12-31', '2014-12-31'],
['2016-03-31', '2015-03-31', '2014-03-31']]
最后如果需要numpy array
:
print (np.array(L1))
[['2016-12-31' '2015-12-31' '2014-12-31']
['2016-03-31' '2015-03-31' '2014-03-31']]
<强>计时强>:
L = [['Fiscal data as of Dec 31 2016', '2016', '2015', '2014'],
['Fiscal data as of Mar 31 2016', '2016', '2015', '2014']] * 10000
In [262]: %%timeit
...: d = {'Jan':'01', 'Feb':'02', 'Mar':'03', 'Apr':'04', 'May':'05', 'Jun':'06',
...: 'Jul':'07', 'Aug':'08', 'Sep':'09', 'Oct':'10', 'Nov':'11', 'Dec':'12'}
...:
...: L2 = []
...: for l in L:
...: a = l[0].split()[-3:-1]
...: a = '-'.join([d.get(a[0]), a[1]])
...: L2.append([x + '-' + a for x in l[1:]])
...:
10 loops, best of 3: 44.3 ms per loop
In [263]: %%timeit
...: out_list=[]
...: for l in L:
...: l_date = datetime.strptime((" ").join(l[0].split()[-3:]), '%b %d %Y')
...: out_list.append([("-").join([str(l_year),str(l_date.month),str(l_date.day)])
...: for l_year in l[-3:]])
...:
1 loop, best of 3: 303 ms per loop
In [264]: %%timeit
...: a = np.array(L)
...: pat = '(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+(\d{1,2})'
...: d = pd.Series(a[:, 0]).str.extract(pat, expand=True).apply('-'.join, 1).add('-')
...: L1 = pd.DataFrame(a[:, 1:]).radd(d, 0).apply(pd.to_datetime).values.astype('datetime64[D]')
...:
1 loop, best of 3: 7.46 s per loop
答案 1 :(得分:0)
这会将您想要的输出创建为嵌套列表
from datetime import datetime
in_list = [['Fiscal data as of Dec 31 2016', '2016', '2015', '2014'],
['Fiscal data as of Mar 31 2016', '2016', '2015', '2014']]
out_list=[]
for l in in_list:
l_date = datetime.strptime((" ").join(l[0].split()[-3:]), '%b %d %Y')
out_list.append([("-").join([str(l_year),str(l_date.month),str(l_date.day)])
for l_year in l[-3:]])