我正在尝试从时间序列pandas DataFrame中删除假日数据。我正在遵循的说明处理DatetimeSeries并使用函数set_index()将此DatetimeSeries应用于DataFrame,从而产生没有假期的时间序列。这个set_index()函数对我不起作用。看看代码......
{data_day.tail()}
Open High Low Close Volume
Date
2018-05-20 NaN NaN NaN NaN 0.0
2018-05-21 2732.50 2739.25 2725.25 2730.50 210297692.0
2018-05-22 2726.00 2741.75 2721.50 2738.25 179224835.0
2018-05-23 2731.75 2732.75 2708.50 2710.50 292305588.0
2018-05-24 2726.00 2730.50 2705.75 2725.00 312575571.0
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
usb = CustomBusinessDay(calendar=USFederalHolidayCalendar())
usb
<CustomBusinessDay>
data_day_No_Holiday = pd.date_range(start='9/7/2005', end='5/21/2018', freq=usb)
data_day_No_Holiday
DatetimeIndex(['2005-09-07', '2005-09-08', '2005-09-09', '2005-09-12',
'2005-09-13', '2005-09-14', '2005-09-15', '2005-09-16',
'2005-09-19', '2005-09-20',
...
'2018-05-08', '2018-05-09', '2018-05-10', '2018-05-11',
'2018-05-14', '2018-05-15', '2018-05-16', '2018-05-17',
'2018-05-18', '2018-05-21'],
dtype='datetime64[ns]', length=3187, freq='C')
data_day.set_index(data_day_No_Holidays, inplace=True)
----------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-118-cf7521d08f6f> in <module>()
----> 1 data_day.set_index(data_day_No_Holidays, inplace=True)
2 # inplace=True tells python to modify the original df and to NOT create a new one.
~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in set_index(self, keys, drop, append, inplace, verify_integrity)
3923 index._cleanup()
3924
-> 3925 frame.index = index
3926
3927 if not inplace:
~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in __setattr__(self, name, value)
4383 try:
4384 object.__getattribute__(self, name)
-> 4385 return object.__setattr__(self, name, value)
4386 except AttributeError:
4387 pass
pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__()
~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in _set_axis(self, axis, labels)
643
644 def _set_axis(self, axis, labels):
--> 645 self._data.set_axis(axis, labels)
646 self._clear_item_cache()
647
~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in set_axis(self, axis, new_labels)
3321 raise ValueError(
3322 'Length mismatch: Expected axis has {old} elements, new '
-> 3323 'values have {new} elements'.format(old=old_len, new=new_len))
3324
3325 self.axes[axis] = new_labels
ValueError: Length mismatch: Expected axis has 4643 elements, new values have 3187 elements
这个过程似乎对另一个程序员来说很漂亮。
是否有人建议进行数据类型转换或将DatetimeIndex应用于DataFrame的函数,这将导致丢弃未在data_day_No_Holiday DatetimeIndex中表示的所有数据行(假日)?
谢谢,如果我发现任何格式错误或者我遗漏了任何相关信息,请告诉我......
答案 0 :(得分:1)
使用reindex
:
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
usb = CustomBusinessDay(calendar=USFederalHolidayCalendar())
data_day_No_Holiday = pd.date_range(start='1/1/2018', end='12/31/2018', freq=usb)
data_day = pd.DataFrame({'Values':np.random.randint(0,100,365)},index = pd.date_range('2018-01-01', periods=365, freq='D'))
data_day.reindex(data_day_No_Holiday).dropna()'
输出(头):
Values
2018-01-02 38
2018-01-03 1
2018-01-04 16
2018-01-05 43
2018-01-08 95