熊猫延长日期时间指数,查询上一年

时间:2018-08-20 14:48:55

标签: python pandas

例如,使用下一年的前一年值来扩展数据框的最快方法是什么

            col1  col2  col3
2018-04-01     0     0     0
2018-04-02     1     2     3
...
2019-03-31   364   728  1092

成为

            col1  col2  col3
2019-04-01     0     0     0
2019-04-02     1     2     3
...
2025-03-31   364   728  1092

这是我的测试用例:

dates = pd.date_range('2018-04-01', '2019-03-31').strftime('%Y-%m-%d')
df    = pd.DataFrame({'col1': range(365),
                     'col2': range(0, 365 * 2, 2),
                     'col3': range(0, 365 * 3, 3)}, index=dates)


assert (extended_df.loc['2019-04-01'] == (0, 0, 0)).all()
assert (extended_df.loc['2019-04-02'] == (1, 2, 3)).all()

1 个答案:

答案 0 :(得分:3)

解决方案应该更简单一些,但是由于29. February是必需的,因此请首先删除每年相同行数的日期,并在必要时append将其删除:

对于重复值,请使用numpy.tile

dates = pd.date_range('2018-04-01', '2019-03-31')
df    = pd.DataFrame({'col1': range(365),
                     'col2': range(0, 365 * 2, 2),
                     'col3': range(0, 365 * 3, 3)}, index=dates)

#create range by add 6 years to last value of index
rng=pd.date_range(df.index[0] + pd.offsets.DateOffset(years=1), 
                  df.index[-1] + pd.offsets.DateOffset(years=6))
print (rng)
DatetimeIndex(['2019-04-01', '2019-04-02', '2019-04-03', '2019-04-04',
               '2019-04-05', '2019-04-06', '2019-04-07', '2019-04-08',
               '2019-04-09', '2019-04-10',
               ...
               '2025-03-22', '2025-03-23', '2025-03-24', '2025-03-25',
               '2025-03-26', '2025-03-27', '2025-03-28', '2025-03-29',
               '2025-03-30', '2025-03-31'],
              dtype='datetime64[ns]', length=2192, freq='D')

#filtering 29.2.
mask1 = df.index.strftime('%m-%d') != '02-29'
mask2 = rng.strftime('%m-%d') == '02-29'
rng1 = rng[mask2]
rng2 = rng[~mask2]

#create 29.2. DataFrame
df2 = pd.DataFrame([[0] * len(df.columns)], columns=df.columns, index=rng1)
print (df2)
            col1  col2  col3
2020-02-29     0     0     0
2024-02-29     0     0     0

df = pd.DataFrame(np.tile(df[mask1].values, (6, 1)),
                  columns=df.columns,
                  index=rng2).append(df2)
print (df.head())
            col1  col2  col3
2019-04-01     0     0     0
2019-04-02     1     2     3
2019-04-03     2     4     6
2019-04-04     3     6     9
2019-04-05     4     8    12

print (df.tail())
            col1  col2  col3
2025-03-29   362   724  1086
2025-03-30   363   726  1089
2025-03-31   364   728  1092
2020-02-29     0     0     0
2024-02-29     0     0     0

#last sorting for correct align 29.2 dates
df = df.sort_index()
#print (df)