自动填充缺少月份的日历

时间:2020-12-28 15:29:01

标签: python pandas date automation autofill

我有一个这样的表格,其中包含日期和值,但是,如您所见,有些月份不在列表中。 (下一年的 5、6、8、10 和 1)。

<头>
日期 任何其他值
2020-01-01 价值
2020-02-01 价值
2020-02-04 价值
2020-02-04 价值
2020-03-11 价值
2020-04-04 价值
2020-07-04 价值
2020-07-04 价值
2020-09-01 价值
2020-11-06 价值
2020-12-02 价值
2021-02-04 价值
2021-03-11 价值

有什么办法可以让我用那些月份自动填充这张表,成为吗?

<头>
日期 任何其他值
2020-01-01 价值
2020-02-01 价值
2020-02-04 价值
2020-02-04 价值
2020-03-11 价值
2020-04-04 价值
2020-05-01 NaN
2020-06-01 NaN
2020-07-04 价值
2020-07-04 价值
2020-08-01 NaN
2020-09-01 价值
2020-10-01 NaN
2020-11-06 价值
2020-12-02 价值
2021-01-01 NaN
2021-02-04 价值
2021-03-11 价值

谢谢大家!

2 个答案:

答案 0 :(得分:0)

我只能想到这个:

import numpy as np

df.Date = pd.to_datetime(df.Date, format='%Y-%m-%d')
df['Date1'] = df['Date']
df = df.set_index('Date').to_period('m')

t2 = (pd.date_range(df.Date1[0], df.Date1[-1],freq='MS'))
t3 = t2.to_period('m')

add_month = []
for i in range(len(t2)):
    if t3[i] not in df.index:
        add_month.append(t2[i])
miss_month_df = pd.DataFrame(add_month, columns=['Date1'])
miss_month_df['Any'] = np.nan
df.reset_index(inplace=True, drop=True)

df_new = pd.concat([df, miss_month_df], ignore_index=True).sort_values(by='Date1').reset_index(drop=True)

df_new:

    Any     Date1
0   value   2020-01-01
1   value   2020-02-01
2   value   2020-02-04
3   value   2020-02-04
4   value   2020-03-11
5   value   2020-04-04
6   NaN     2020-05-01
7   NaN     2020-06-01
8   value   2020-07-04
9   value   2020-07-04
10  NaN     2020-08-01
11  value   2020-09-01
12  NaN     2020-10-01
13  value   2020-11-06
14  value   2020-12-02
15  NaN     2021-01-01
16  value   2021-02-04
17  value   2021-03-11

答案 1 :(得分:0)

从逻辑上讲,它是您感兴趣的所有月份的外部联接。

import pandas as pd

df = pd.DataFrame({"Date":["2019-12-31","2020-01-31","2020-02-03","2020-02-03","2020-03-10","2020-04-03","2020-07-03","2020-07-03","2020-08-31","2020-11-05","2020-12-01","2021-02-03","2021-03-10"],"Any other value":["value","value","value","value","value","value","value","value","value","value","value","value","value"]})
df["Date"] = pd.to_datetime(df["Date"])

df["month"] = (df['Date'] - pd.offsets.MonthBegin(1)).dt.floor('d')


df = df.merge(
pd.DataFrame({"month":pd.date_range(df["month"].min(), df["month"].max(), freq="MS")}),
    on="month", how="outer")
df["Date"].fillna(df["month"], inplace=True)
df = df.drop(columns="month")

print(df.to_string(index=False))

输出

      Date Any other value
2019-12-31           value
2020-01-31           value
2020-02-03           value
2020-02-03           value
2020-03-10           value
2020-04-03           value
2020-07-03           value
2020-07-03           value
2020-08-31           value
2020-11-05           value
2020-12-01           value
2021-02-03           value
2021-03-10           value
2020-05-01             NaN
2020-06-01             NaN
2020-09-01             NaN
2020-10-01             NaN
2020-12-01             NaN
2021-01-01             NaN

相关问题