如何用数据框中的下一个日期来估算错过的日期?
wtg_at1.tail(10)
环境温度 | 日期 | |
---|---|---|
818 | 31.237499 | 2020-03-28 |
819 | 32.865974 | 2020-03-29 |
820 | 32.032558 | 2020-03-30 |
821 | 31.671166 | NaN |
822 | 31.389927 | NaN |
823 | 31.243660 | NaN |
824 | 31.206777 | NaN |
825 | 31.241503 | NaN |
826 | 31.309531 | NaN |
827 | 31.382531 | NaN |
我期待我的输出数据框类似于下面的内容。 3 月 30 日之后,我期待下一个日期是 3 月 31 日。
环境温度 | 日期 | |
---|---|---|
818 | 31.237499 | 2020-03-28 |
819 | 32.865974 | 2020-03-29 |
820 | 32.032558 | 2020-03-30 |
821 | 31.671166 | 2020-03-31 |
822 | 31.389927 | 2020-04-01 |
823 | 31.243660 | 2020-04-02 |
824 | 31.206777 | 2020-04-03 |
825 | 31.241503 | 2020-04-04 |
826 | 31.309531 | 2020-04-05 |
827 | 31.382531 | 2020-04-06 |
我尝试了下面的代码,但没有给出想要的输出。
wtg_at1.append(pd.DataFrame({'Date': pd.date_range(start=wtg_at1.Date.iloc[-8], periods=7, freq='D', closed='right')}))
wtg_at1
环境温度 | 日期 | |
---|---|---|
0 | 32.032558 | 2017-12-31 |
1 | 26.667757 | 2018-01-01 |
2 | 25.655754 | 2018-01-02 |
3 | 25.514013 | 2018-01-03 |
4 | 24.927652 | 2018-01-04 |
... | ... | ... |
823 | 31.243660 | NaN |
824 | 31.206777 | NaN |
825 | 31.241503 | NaN |
826 | 31.309531 | NaN |
827 | 31.382531 | NaN |
答案 0 :(得分:1)
如果只有一组缺失值,可以向前填充它们并通过转换为天时间增量的累积和添加计数器:
df['Date'] = pd.to_datetime(df['Date'])
df['Date'] = df['Date'].ffill() + pd.to_timedelta(df['Date'].isna().cumsum(), unit='d')
print (df)
AmbientTemperatue Date
818 31.237499 2020-03-28
819 32.865974 2020-03-29
820 32.032558 2020-03-30
821 31.671166 2020-03-31
822 31.389927 2020-04-01
823 31.243660 2020-04-02
824 31.206777 2020-04-03
825 31.241503 2020-04-04
826 31.309531 2020-04-05
827 31.382531 2020-04-06
另一个可能的想法是通过DataFrame
的最小日期时间和长度重新分配值:
df['Date'] = pd.date_range(df['Date'].min(), periods=len(df))
如果存在多个缺失值的组:
print (df)
AmbientTemperatue Date
818 31.237499 2020-03-28
819 32.865974 2020-03-29
820 32.032558 2020-03-30
821 31.671166 NaN
822 31.389927 NaN
823 31.243660 NaN
824 31.206777 2020-05-08
825 31.241503 NaN
826 31.309531 NaN
827 31.382531 NaN
df['Date'] = pd.to_datetime(df['Date'])
m = df['Date'].notna()
s = (~m).groupby(m.cumsum()).cumsum()
df['Date'] = df['Date'].ffill() + pd.to_timedelta(s, unit='d')
print (df)
AmbientTemperatue Date
818 31.237499 2020-03-28
819 32.865974 2020-03-29
820 32.032558 2020-03-30
821 31.671166 2020-03-31
822 31.389927 2020-04-01
823 31.243660 2020-04-02
824 31.206777 2020-05-08
825 31.241503 2020-05-09
826 31.309531 2020-05-10
827 31.382531 2020-05-11