延长数据框Python中值的日期

时间:2018-11-21 23:23:40

标签: python python-3.x pandas python-2.7 dataframe

我的数据如下:

Year      Month       Region       Value1       Value2
2016        1         west         2            3
2016        1         east         4            5
2016        1         north        5            3
2016        2         west         6            4
2016        2         east         7            3
.
.
2016        12        west         2            3
2016        12        east         3            7
2016        12        north        6            8
2017        1         west         2            3
.
.
2018        7         west         1            1
2018        7         east         9            9
2018        7         north        5            1

我希望将我的值扩展到每个月的2021年,但将先前值与集合中最后一个月(2018年的第7月)保持一致。

所需的输出将按区域,月份和年份附加到每个集合的末尾,例如:

2018        7         west         1            1
2018        7         east         9            9
2018        7         north        5            1
2018        8         west         1            1
2018        8         east         9            9
2018        8         north        5            1
2018        9         west         1            1
2018        9         east         9            9
2018        9         north        5            1
.
.
2019        7         west         1            1
2019        7         east         9            9
2019        7         north        5            1
.
.
2021        7         west         1            1
2021        7         east         9            9
2021        7         north        5            1

解决这个问题的最佳方法是什么?

1 个答案:

答案 0 :(得分:1)

我将创建一个使用pd.date_range且频率为几个月的函数:

此功能假定您具有三个区域,但可以进行更多修改。

def myFunction(df, periods, freq='M'):
    # find the last date in the df
    last = pd.to_datetime(df.Year*10000+df.Month*100+1,format='%Y%m%d').max()

    # create new date range based on n periods with a freq of months
    newDates = pd.date_range(start=last, periods=periods+1, freq=freq)
    newDates = newDates[newDates>last]
    newDates = newDates[:periods+1]
    new_df = pd.DataFrame({'Date':newDates})[1:]

    # convert Date to year and month columns
    new_df['Year'] = new_df['Date'].dt.year
    new_df['Month'] = new_df['Date'].dt.month
    new_df.drop(columns='Date', inplace=True)

    # add your three regions and ffill values
    west = df[:-2].append([new_df], sort=False, ignore_index=True).ffill()
    east = df[:-1].append([new_df], sort=False, ignore_index=True).ffill()
    north = df.append([new_df], sort=False, ignore_index=True).ffill()

    # append you three region dfs and drop duplicates
    new = west.append([east,north], sort=False, ignore_index=True).drop_duplicates()
    return new.sort_values(['Year', 'Month']).reset_index().drop(columns='index')

myFunction(df,3)

将周期设置为三个,这将在接下来的三个月内返回...

    Year    Month   Region  Value1  Value2
0   2016    1        west   2.0      3.0
1   2016    1        east   4.0      5.0
2   2016    1        north  5.0      3.0
3   2016    2        west   6.0      4.0
4   2016    2        east   7.0      3.0
5   2016    12       west   2.0      3.0
6   2016    12       east   3.0      7.0
7   2016    12       north  6.0      8.0
8   2017    1        west   2.0      3.0
9   2018    7        west   1.0      1.0
10  2018    7        east   9.0      9.0
11  2018    7        north  5.0      1.0
12  2018    8        west   1.0      1.0
13  2018    8        east   9.0      9.0
14  2018    8        north  5.0      1.0
15  2018    9        west   1.0      1.0
16  2018    9        east   9.0      9.0
17  2018    9        north  5.0      1.0
18  2018    10       west   1.0      1.0
19  2018    10       east   9.0      9.0
20  2018    10       north  5.0      1.0