例如,使用下一年的前一年值来扩展数据框的最快方法是什么
col1 col2 col3
2018-04-01 0 0 0
2018-04-02 1 2 3
...
2019-03-31 364 728 1092
成为
col1 col2 col3
2019-04-01 0 0 0
2019-04-02 1 2 3
...
2025-03-31 364 728 1092
这是我的测试用例:
dates = pd.date_range('2018-04-01', '2019-03-31').strftime('%Y-%m-%d')
df = pd.DataFrame({'col1': range(365),
'col2': range(0, 365 * 2, 2),
'col3': range(0, 365 * 3, 3)}, index=dates)
assert (extended_df.loc['2019-04-01'] == (0, 0, 0)).all()
assert (extended_df.loc['2019-04-02'] == (1, 2, 3)).all()
答案 0 :(得分:3)
解决方案应该更简单一些,但是由于29. February
是必需的,因此请首先删除每年相同行数的日期,并在必要时append
将其删除:
对于重复值,请使用numpy.tile
。
dates = pd.date_range('2018-04-01', '2019-03-31')
df = pd.DataFrame({'col1': range(365),
'col2': range(0, 365 * 2, 2),
'col3': range(0, 365 * 3, 3)}, index=dates)
#create range by add 6 years to last value of index
rng=pd.date_range(df.index[0] + pd.offsets.DateOffset(years=1),
df.index[-1] + pd.offsets.DateOffset(years=6))
print (rng)
DatetimeIndex(['2019-04-01', '2019-04-02', '2019-04-03', '2019-04-04',
'2019-04-05', '2019-04-06', '2019-04-07', '2019-04-08',
'2019-04-09', '2019-04-10',
...
'2025-03-22', '2025-03-23', '2025-03-24', '2025-03-25',
'2025-03-26', '2025-03-27', '2025-03-28', '2025-03-29',
'2025-03-30', '2025-03-31'],
dtype='datetime64[ns]', length=2192, freq='D')
#filtering 29.2.
mask1 = df.index.strftime('%m-%d') != '02-29'
mask2 = rng.strftime('%m-%d') == '02-29'
rng1 = rng[mask2]
rng2 = rng[~mask2]
#create 29.2. DataFrame
df2 = pd.DataFrame([[0] * len(df.columns)], columns=df.columns, index=rng1)
print (df2)
col1 col2 col3
2020-02-29 0 0 0
2024-02-29 0 0 0
df = pd.DataFrame(np.tile(df[mask1].values, (6, 1)),
columns=df.columns,
index=rng2).append(df2)
print (df.head())
col1 col2 col3
2019-04-01 0 0 0
2019-04-02 1 2 3
2019-04-03 2 4 6
2019-04-04 3 6 9
2019-04-05 4 8 12
print (df.tail())
col1 col2 col3
2025-03-29 362 724 1086
2025-03-30 363 726 1089
2025-03-31 364 728 1092
2020-02-29 0 0 0
2024-02-29 0 0 0
#last sorting for correct align 29.2 dates
df = df.sort_index()
#print (df)