我有一个带有DateTimeIndex的数据框:
import pandas as pd
from pandas.tseries.offsets import *
data = pd.read_excel('P:\\Simon\\govt_bond_yields.xlsx')
data.head()
USA Italy UK EURO ZONE GREECE GERMANY
2018-06-25 2.8748 2.782 1.299 0.327 4.102 0.327
2018-06-22 2.8949 2.694 1.319 0.335 4.114 0.335
2018-06-21 2.8967 2.732 1.277 0.333 4.279 0.333
2018-06-20 2.9389 2.549 1.297 0.375 4.332 0.375
2018-06-19 2.8967 2.557 1.283 0.370 4.344 0.370
目前我的索引没有频率
data.index
DatetimeIndex(['2018-06-25', '2018-06-22', '2018-06-21', '2018-06-20',
'2018-06-19', '2018-06-18', '2018-06-15', '2018-06-14',
'2018-06-13', '2018-06-12',
...
'2015-01-27', '2015-01-26', '2015-01-23', '2015-01-22',
'2015-01-21', '2015-01-20', '2015-01-16', '2015-01-15',
'2015-01-14', '2015-01-13'],
dtype='datetime64[ns]', length=862, freq=None)
我正在尝试设置索引频率,但是这样做之后,我得到一个空的数据框
data.asfreq(freq='D')
USA Italy UK EURO ZONE GREECE GERMANY
我在这里做什么错了?
答案 0 :(得分:5)
如果您首先对索引进行排序,这应该可以工作,因为asfreq
很难知道您要怎么做。例如:
# Unsorted data with a datetime index:
>>> data
USA Italy UK EURO ZONE GREECE GERMANY
2018-06-25 2.8748 2.782 1.299 0.327 4.102 0.327
2018-06-22 2.8949 2.694 1.319 0.335 4.114 0.335
2018-06-21 2.8967 2.732 1.277 0.333 4.279 0.333
2018-06-20 2.9389 2.549 1.297 0.375 4.332 0.375
2018-06-19 2.8967 2.557 1.283 0.370 4.344 0.370
>>> data.sort_index().asfreq(freq='D')
USA Italy UK EURO ZONE GREECE GERMANY
2018-06-19 2.8967 2.557 1.283 0.370 4.344 0.370
2018-06-20 2.9389 2.549 1.297 0.375 4.332 0.375
2018-06-21 2.8967 2.732 1.277 0.333 4.279 0.333
2018-06-22 2.8949 2.694 1.319 0.335 4.114 0.335
2018-06-23 NaN NaN NaN NaN NaN NaN
2018-06-24 NaN NaN NaN NaN NaN NaN
2018-06-25 2.8748 2.782 1.299 0.327 4.102 0.327
您可以检查索引以确保其有效:
# Check the index:
>>> data.sort_index().asfreq(freq='D').index
DatetimeIndex(['2018-06-19', '2018-06-20', '2018-06-21', '2018-06-22',
'2018-06-23', '2018-06-24', '2018-06-25'],
dtype='datetime64[ns]', freq='D')
答案 1 :(得分:2)
IIUC,我想您想做的是resample
和asfreq
:
data.resample('D').asfreq()
输出:
USA Italy UK EURO ZONE GREECE GERMANY
2018-06-19 2.8967 2.557 1.283 0.370 4.344 0.370
2018-06-20 2.9389 2.549 1.297 0.375 4.332 0.375
2018-06-21 2.8967 2.732 1.277 0.333 4.279 0.333
2018-06-22 2.8949 2.694 1.319 0.335 4.114 0.335
2018-06-23 NaN NaN NaN NaN NaN NaN
2018-06-24 NaN NaN NaN NaN NaN NaN
2018-06-25 2.8748 2.782 1.299 0.327 4.102 0.327
答案 2 :(得分:0)
这是对Sacul答案的改进
df2 = datatable_df.sort_index()。asfreq(freq ='YS')
note:按照sacul的命令,原始数据帧不会受到任何影响。因此,我认为该步骤是错误的。然后,我尝试将结果赋值给一个值,它起作用了。我得以继续进行时间序列研究。