如何通过重复最后一行并将索引同时增加一年来扩展熊猫数据框

时间:2018-10-17 14:25:16

标签: pandas datetime

我有以下数据框:

import numpy as np
import pandas as pd
dates = pd.date_range('1/1/2014', periods=4)
df = pd.DataFrame(np.eye(4, 4), index=dates, columns=['A', 'B', 'C', 'D'])
print(df)


            A    B    C    D
2014-01-01  1.0  0.0  0.0  0.0
2014-01-02  0.0  1.0  0.0  0.0
2014-01-03  0.0  0.0  1.0  0.0
2014-01-04  0.0  0.0  0.0  1.0

我将数据框的最后一行扩展如下:

for i in range(3):
    df = df.append(df[-1:])
print(df)

             A    B    C    D
2014-01-01  1.0  0.0  0.0  0.0
2014-01-02  0.0  1.0  0.0  0.0
2014-01-03  0.0  0.0  1.0  0.0
2014-01-04  0.0  0.0  0.0  1.0
2014-01-04  0.0  0.0  0.0  1.0
2014-01-04  0.0  0.0  0.0  1.0
2014-01-04  0.0  0.0  0.0  1.0

但是,我想同时将索引增加一年。关于如何做到这一点的任何想法?

预期结果:

             A    B    C    D
2014-01-01  1.0  0.0  0.0  0.0
2014-01-02  0.0  1.0  0.0  0.0
2014-01-03  0.0  0.0  1.0  0.0
2014-01-04  0.0  0.0  0.0  1.0
2015-01-04  0.0  0.0  0.0  1.0
2016-01-04  0.0  0.0  0.0  1.0
2017-01-04  0.0  0.0  0.0  1.0

非常感谢,

2 个答案:

答案 0 :(得分:1)

几行:

rows_to_add = 10

new_dates = pd.DatetimeIndex([df.index[-1] + pd.DateOffset(years=y)
                               for y in range(rows_to_add)])

df.reindex(df.index.union(new_dates).unique().sort_values()).ffill()

              A    B    C    D
2014-01-01  1.0  0.0  0.0  0.0
2014-01-02  0.0  1.0  0.0  0.0
2014-01-03  0.0  0.0  1.0  0.0
2014-01-04  0.0  0.0  0.0  1.0
2015-01-04  0.0  0.0  0.0  1.0
2016-01-04  0.0  0.0  0.0  1.0
2017-01-04  0.0  0.0  0.0  1.0
2018-01-04  0.0  0.0  0.0  1.0
2019-01-04  0.0  0.0  0.0  1.0
2020-01-04  0.0  0.0  0.0  1.0
2021-01-04  0.0  0.0  0.0  1.0
2022-01-04  0.0  0.0  0.0  1.0
2023-01-04  0.0  0.0  0.0  1.0

解释

您可以通过执行以下操作来创建新行:

rows_to_add = 10

new_dates = pd.DatetimeIndex([df.index[-1] + pd.DateOffset(years=y)
                               for y in range(rows_to_add)])

DatetimeIndex(['2014-01-04', '2015-01-04', '2016-01-04', '2017-01-04',
               '2018-01-04', '2019-01-04', '2020-01-04', '2021-01-04',
               '2022-01-04', '2023-01-04'],
              dtype='datetime64[ns]', freq=None)

然后将这些日期添加到原始日期(保持唯一的日期并对索引进行排序):

new_index = df.index.union(new_dates).unique().sort_values()

DatetimeIndex(['2014-01-01', '2014-01-02', '2014-01-03', '2014-01-04',
               '2015-01-04', '2016-01-04', '2017-01-04', '2018-01-04',
               '2019-01-04', '2020-01-04', '2021-01-04', '2022-01-04',
               '2023-01-04'],
              dtype='datetime64[ns]', freq=None)

然后重新索引原始数据帧,并用最后一行中的值填充新行:

df.reindex(new_index).ffill()

              A    B    C    D
2014-01-01  1.0  0.0  0.0  0.0
2014-01-02  0.0  1.0  0.0  0.0
2014-01-03  0.0  0.0  1.0  0.0
2014-01-04  0.0  0.0  0.0  1.0
2015-01-04  0.0  0.0  0.0  1.0
2016-01-04  0.0  0.0  0.0  1.0
2017-01-04  0.0  0.0  0.0  1.0
2018-01-04  0.0  0.0  0.0  1.0
2019-01-04  0.0  0.0  0.0  1.0
2020-01-04  0.0  0.0  0.0  1.0
2021-01-04  0.0  0.0  0.0  1.0
2022-01-04  0.0  0.0  0.0  1.0
2023-01-04  0.0  0.0  0.0  1.0

答案 1 :(得分:0)

使用:

df[-1:].index
DatetimeIndex(['2014-01-04'], dtype='datetime64[ns]', freq='D')
dates_new = pd.date_range(df[-1:].index.values[0], periods=4, freq = pd.DateOffset(years=1))
#set periods to number of rows you want to add + 1
dates_new
DatetimeIndex(['2014-01-04', '2015-01-04', '2016-01-04', '2017-01-04'], dtype='datetime64[ns]', freq='<DateOffset: years=1>')

    df_new = pd.DataFrame(index=dates_new, columns=['A', 'B', 'C', 'D'])
df_new =  df_new.apply(lambda x: df.loc[pd.datetime(2014, 1, 4)], axis = 1)
df_new
             A  B   C   D
2014-01-04  0.0 0.0 0.0 1.0
2015-01-04  0.0 0.0 0.0 1.0
2016-01-04  0.0 0.0 0.0 1.0
2017-01-04  0.0 0.0 0.0 1.0

df = df.append(df_new)

            A   B   C   D
2014-01-01  1.0 0.0 0.0 0.0
2014-01-02  0.0 1.0 0.0 0.0
2014-01-03  0.0 0.0 1.0 0.0
2014-01-04  0.0 0.0 0.0 1.0
2014-01-04  0.0 0.0 0.0 1.0
2015-01-04  0.0 0.0 0.0 1.0
2016-01-04  0.0 0.0 0.0 1.0
2017-01-04  0.0 0.0 0.0 1.0

Kinda感觉像是个黑客

您可以使用以下方法删除重复的索引:

df = df[~df.index.duplicated(keep='first')]