我想实施一个前瞻性分析。就像在图片上一样。
我写了这段代码。
@staticmethod
def build_rolling_calendar(start_date, end_date, out_of_sample_size, runs):
days = (end_date - start_date).days
in_sample_size = (100 - out_of_sample_size) / 100
out_of_sample_size = out_of_sample_size / 100
total_days_per_run = round(days / (runs * out_of_sample_size + in_sample_size))
in_sample_days_per_run = round(total_days_per_run * in_sample_size)
out_of_sample_days_per_run = round(total_days_per_run * out_of_sample_size)
calendar = pd.DataFrame()
calendar['InSampleStarts'] = [start_date + timedelta(days=(out_of_sample_days_per_run * x))
for x in range(runs)]
calendar['InSampleEnds'] = [x + timedelta(days=in_sample_days_per_run)
for x in calendar['InSampleStarts']]
calendar['OutSampleStarts'] = [start_date + timedelta(days=in_sample_days_per_run) +
timedelta(days=(out_of_sample_days_per_run * x))
for x in range(runs)]
calendar['OutSampleEnds'] = [x + timedelta(days=out_of_sample_days_per_run)
for x in calendar['OutSampleStarts']]
return calendar
但遗憾的是,此代码返回的值超出end_date。我想知道如何修复这个错误?
P.S。我的测试电话
calendar = build_rolling_calendar(datetime(2016, 1, 1), datetime(2017, 5, 31), 25, 10)