我从" date"创建了一个DatetimeIndex。柱:
sales.index = pd.DatetimeIndex(sales["date"])
现在索引如下:
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-04', '2003-01-06',
'2003-01-07', '2003-01-08', '2003-01-09', '2003-01-10',
'2003-01-11', '2003-01-13',
...
'2016-07-22', '2016-07-23', '2016-07-24', '2016-07-25',
'2016-07-26', '2016-07-27', '2016-07-28', '2016-07-29',
'2016-07-30', '2016-07-31'],
dtype='datetime64[ns]', name='date', length=4393, freq=None)
如您所见,freq
属性为无。我怀疑未来的错误是由于缺少freq
引起的。但是,如果我尝试明确设置频率:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-148-30857144de81> in <module>()
1 #### DEBUG
----> 2 sales_train = disentangle(df_train)
3 sales_holdout = disentangle(df_holdout)
4 result = sarima_fit_predict(sales_train.loc[5002, 9990]["amount_sold"], sales_holdout.loc[5002, 9990]["amount_sold"])
<ipython-input-147-08b4c4ecdea3> in disentangle(df_train)
2 # transform sales table to disentangle sales time series
3 sales = df_train[["date", "store_id", "article_id", "amount_sold"]]
----> 4 sales.index = pd.DatetimeIndex(sales["date"], freq="d")
5 sales = sales.pivot_table(index=["store_id", "article_id", "date"])
6 return sales
/usr/local/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
89 else:
90 kwargs[new_arg_name] = new_arg_value
---> 91 return func(*args, **kwargs)
92 return wrapper
93 return _deprecate_kwarg
/usr/local/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs)
399 'dates does not conform to passed '
400 'frequency {1}'
--> 401 .format(inferred, freq.freqstr))
402
403 if freq_infer:
ValueError: Inferred frequency None from passed dates does not conform to passed frequency D
所以显然已经推断出一个频率,但它既不存储在DatetimeIndex的freq
也不存储inferred_freq
属性中 - 两者都是None。有人能解决这个困惑吗?
答案 0 :(得分:8)
你有几个选择:
15
pd.infer_freq
我怀疑道路上的错误是由于缺少频率造成的。
你是对的。这是我经常使用的:
pd.tseries.frequencies.to_offset
一个例子:
def add_freq(idx, freq=None):
"""Add a frequency attribute to idx, through inference or directly.
Returns a copy. If `freq` is None, it is inferred.
"""
idx = idx.copy()
if freq is None:
if idx.freq is None:
freq = pd.infer_freq(idx)
else:
return idx
idx.freq = pd.tseries.frequencies.to_offset(freq)
if idx.freq is None:
raise AttributeError('no discernible frequency found to `idx`. Specify'
' a frequency string with `freq`.')
return idx
使用idx=pd.to_datetime(['2003-01-02', '2003-01-03', '2003-01-06']) # freq=None
print(add_freq(idx)) # inferred
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='B')
print(add_freq(idx, freq='D')) # explicit
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='D')
实际上会重新索引(填充)缺少的日期,因此如果那不是您想要的,请小心。
更改频率的主要功能是
asfreq
功能。 对于asfreq
,这基本上只是一个很薄,但很方便DatetimeIndex
周围的包装,生成reindex
并调用date_range
。
答案 1 :(得分:4)
似乎与3kt音符缺失日期有关。你或许可以修复&#34;与EdChum建议使用asfreq('D')
,但这会为您提供缺少数据值的连续索引。对于我编写的一些示例数据,它可以正常工作:
df=pd.DataFrame({ 'x':[1,2,4] },
index=pd.to_datetime(['2003-01-02', '2003-01-03', '2003-01-06']) )
df
Out[756]:
x
2003-01-02 1
2003-01-03 2
2003-01-06 4
df.index
Out[757]: DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'],
dtype='datetime64[ns]', freq=None)
请注意freq=None
。如果您应用asfreq('D')
,则会更改为freq='D'
:
df.asfreq('D')
Out[758]:
x
2003-01-02 1.0
2003-01-03 2.0
2003-01-04 NaN
2003-01-05 NaN
2003-01-06 4.0
df.asfreq('d').index
Out[759]:
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-04', '2003-01-05',
'2003-01-06'],
dtype='datetime64[ns]', freq='D')
更一般地说,根据您的具体操作,您可能需要查看以下其他选项,例如reindex&amp;重新采样:Add missing dates to pandas dataframe
答案 2 :(得分:1)
例如,如果您传递的日期未排序,则可能会发生这种情况。
看看这个例子:
example_ts = pd.Series(data=range(10),
index=pd.date_range('2020-01-01', '2020-01-10', freq='D'))
example_ts.index = pd.DatetimeIndex(np.hstack([example_ts.index[-1:],
example_ts.index[:-1]]), freq='D')
由于日期不连续,前面的代码会导致您的错误。
example_ts = pd.Series(data=range(10),
index=pd.date_range('2020-01-01', '2020-01-10', freq='D'))
example_ts.index = pd.DatetimeIndex(np.hstack([example_ts.index[:-1],
example_ts.index[-1:]]), freq='D')
相反,这个运行正确。
答案 3 :(得分:0)
我不确定python的早期版本是否没有此功能,但是3.6有以下简单解决方案:
# 'b' stands for business days
# 'w' for weekly, 'd' for daily, and you get the idea...
df.index.freq = 'b'
答案 4 :(得分:0)
我不确定,但是我遇到了同样的错误。我无法通过上面发布的建议解决问题,但是使用以下解决方案解决了问题。
Pandas DatetimeIndex + seasonal_decompose = missing frequency。
最好的问候