我希望通过广播的年,月,日,小时等数组在Pandas中创建DateTimeIndex
。通过列表理解,这是相对简单的。 e.g。
import numpy as np
import pandas as pd
def build_DatetimeIndex(*args):
return pd.DatetimeIndex([pd.datetime(*tup)
for tup in np.broadcast(*args)])
例如:
>>> year = 2012
>>> months = [1, 2, 5, 6]
>>> days = [1, 15, 1, 15]
>>> build_DatetimeIndex(year, months, days)
DatetimeIndex(['2012-01-01', '2012-02-15', '2012-05-01', '2012-06-15'],
dtype='datetime64[ns]', freq=None)
但是由于列表理解,随着输入的大小增加,这变得相当缓慢。在Pandas中是否有内置的方法来执行此操作,或者是否有任何方法可以根据快速矢量化操作定义build_DatetimeIndex
?
答案 0 :(得分:3)
您可以使用dtypes m8[Y]
,m8[M]
,m8[D]
制作Timedeltas
数组,并将它们一起添加到日期:" 0000-01-01& #34;:
import pandas as pd
import numpy as np
year = np.arange(2010, 2020)
months = np.arange(1, 13)
days = np.arange(1, 29)
y, m, d = map(np.ravel, np.broadcast_arrays(*np.ix_(year, months, days)))
start = np.array(["0000-01-01"], dtype="M8[Y]")
r1 = start + y.astype("m8[Y]") + (m - 1).astype("m8[M]") + (d-1).astype("m8[D]")
def build_DatetimeIndex(*args):
return pd.DatetimeIndex([pd.datetime(*tup)
for tup in np.broadcast(*args)])
r2 = build_DatetimeIndex(y, m, d)
np.all(pd.DatetimeIndex(r1) == r2)
包括小时,分钟,秒:
import pandas as pd
import numpy as np
y = np.array([2012, 2013])
m = np.array([1, 3])
d = np.array([5, 20])
H = np.array([10, 20])
M = np.array([30, 40])
S = np.array([0, 30])
start = np.array(["0000-01-01"], dtype="M8[Y]")
date = start + y.astype("m8[Y]") + (m - 1).astype("m8[M]") + (d-1).astype("m8[D]")
datetime = date.astype("M8[s]") + H.astype("m8[h]") + M.astype("m8[m]") + S.astype("m8[s]")
pd.Series(datetime)
结果:
0 2012-01-05 10:30:00
1 2013-03-20 20:40:30
dtype: datetime64[ns]
答案 1 :(得分:1)
import numpy as np
import pandas as pd
def build_DatetimeIndex(years, months, days):
years = pd.Index(years, name='year')
months = pd.Index(months, name='month')
days = pd.Index(days, name='day')
panel = pd.Panel(items=days, major_axis=years, minor_axis=months)
to_dt = lambda x: pd.datetime(*x)
series = panel.fillna(0).to_frame().stack().index.to_series()
return pd.DatetimeIndex(series.apply(to_dt))
dti = build_DatetimeIndex(range(1900, 2000), range(1, 13), [1, 15])
print dti
DatetimeIndex(['1900-01-01', '1900-01-15', '1900-02-01', '1900-02-15',
'1900-03-01', '1900-03-15', '1900-04-01', '1900-04-15',
'1900-05-01', '1900-05-15',
...
'1999-08-01', '1999-08-15', '1999-09-01', '1999-09-15',
'1999-10-01', '1999-10-15', '1999-11-01', '1999-11-15',
'1999-12-01', '1999-12-15'],
dtype='datetime64[ns]', length=2400, freq=None)
答案 2 :(得分:1)
import pandas as pd
import numpy as np
def nao(*args):
if len(args) == 1:
return np.asarray(args[-1]).flatten()
else:
return np.add.outer(args[-1], nao(*args[:-1]) * 1e2).flatten()
def handler(*args):
fmt = np.array(['%Y', '%m', '%d', '%H', '%M', '%S'])
fstr = "".join(fmt[range(len(args))])
ds = nao(*args).astype(np.dtype(int))
return pd.Index(pd.Series(ds).apply(lambda x: pd.datetime.strptime(str(x), fstr)))
handler(range(1900, 2000), range(1, 13), range(1, 28))
DatetimeIndex(['1900-01-01', '1901-01-01', '1902-01-01', '1903-01-01',
'1904-01-01', '1905-01-01', '1906-01-01', '1907-01-01',
'1908-01-01', '1909-01-01',
...
'1990-12-27', '1991-12-27', '1992-12-27', '1993-12-27',
'1994-12-27', '1995-12-27', '1996-12-27', '1997-12-27',
'1998-12-27', '1999-12-27'],
dtype='datetime64[ns]', length=32400, freq=None)
stamp = pd.datetime.now()
for _ in range (10):
handler(range(1900, 2000), range(1, 13), range(1, 28))
print pd.datetime.now() - stamp
0:00:04.870000
答案 3 :(得分:1)
这只是为了结束循环,并给出了pd.to_datetime
功能的一个示例,该功能是Jeff在https://github.com/pydata/pandas/pull/12967中指出的。
pd.to_datetime在DataFrame
中的带有或不带有列的年,月,日等中均可使用。 (请参阅Github讨论,以获取具有现有列的示例。)
根据示例,创建的DatetimeIndex
没有在DataFrame
中包含年,月,日等的任何现有列。这是可能的。
import numpy as np
import pandas as pd
datedict = {'year': [2012]*4, # Length must equal 'month' and 'day' length
'month': [1, 2, 5, 6],
'day': [1, 15, 1, 15]}
pd.DatetimeIndex(pd.to_datetime(datedict))
DatetimeIndex(['2012-01-01', '2012-02-15', '2012-05-01', '2012-06-15'],
dtype='datetime64[ns]', freq=None)