我有一个 df_train 如下:
X1
01-01-2020 | 1
01-02-2020 | 2
01-03-2020 | 3
01-04-2020 | 4
现在我想用日期时间索引构建另一个 df
我将获得日期时间索引:
future_dates = pd.date_range(df_train.index.max(), periods=12, freq='M')
我想得到一个新的 df,它在开始时有一个 df_train 的副本,而对于其余的日期,我们将得到 df_train 的平均值。
预期结果:
X1
01-05-2020 | 1
01-06-2020 | 2
01-07-2020 | 3
01-08-2020 | 4
01-09-2020 | 2.5
01-10-2020 | 2.5
01-11-2020 | 2.5
01-12-2020 | 2.5
01-01-2021 | 2.5
01-02-2021 | 2.5
01-03-2021 | 2.5
01-04-2021 | 2.5
答案 0 :(得分:2)
如果尚未转换索引 to_datetime
:
df_train.index = pd.to_datetime(df_train.index, dayfirst=True)
然后尝试使用 MonthBegin
和 MS
偏移索引:
future_dates = pd.date_range(
df_train.index.max() + pd.tseries.offsets.MonthBegin(1),
periods=12,
freq='MS'
)
DatetimeIndex(['2020-05-01', '2020-06-01', '2020-07-01', '2020-08-01',
'2020-09-01', '2020-10-01', '2020-11-01', '2020-12-01',
'2021-01-01', '2021-02-01', '2021-03-01', '2021-04-01'],
dtype='datetime64[ns]', freq='MS')
然后创建一个新框架并根据 df_train
的长度替换第一个值:
new_df = pd.DataFrame({'X1': df_train['X1'].mean()}, index=future_dates)
new_df.iloc[:df_train.shape[0], new_df.columns.get_loc('X1')] = df_train['X1'].values
new_df
:
X1
2020-05-01 1.0
2020-06-01 2.0
2020-07-01 3.0
2020-08-01 4.0
2020-09-01 2.5
2020-10-01 2.5
2020-11-01 2.5
2020-12-01 2.5
2021-01-01 2.5
2021-02-01 2.5
2021-03-01 2.5
2021-04-01 2.5
或者从列表推导式构建:
new_df = pd.DataFrame({
'X1': [*df_train['X1'],
*(len(future_dates) - len(df_train)) * [df_train['X1'].mean()]]
}, index=future_dates)
new_df
:
X1
2020-05-01 1.0
2020-06-01 2.0
2020-07-01 3.0
2020-08-01 4.0
2020-09-01 2.5
2020-10-01 2.5
2020-11-01 2.5
2020-12-01 2.5
2021-01-01 2.5
2021-02-01 2.5
2021-03-01 2.5
2021-04-01 2.5
然后用DatetimeIndex.strftime
恢复原来的格式:
new_df.index = new_df.index.strftime('%d-%m-%Y')
X1
01-05-2020 1.0
01-06-2020 2.0
01-07-2020 3.0
01-08-2020 4.0
01-09-2020 2.5
01-10-2020 2.5
01-11-2020 2.5
01-12-2020 2.5
01-01-2021 2.5
01-02-2021 2.5
01-03-2021 2.5
01-04-2021 2.5
一起:
import pandas as pd
df_train = pd.DataFrame({
'X1': {'01-01-2020': 1, '01-02-2020': 2, '01-03-2020': 3, '01-04-2020': 4}
})
df_train.index = pd.to_datetime(df_train.index, dayfirst=True)
future_dates = pd.date_range(
df_train.index.max() + pd.tseries.offsets.MonthBegin(1),
periods=12,
freq='MS'
)
new_df = pd.DataFrame({'X1': df_train['X1'].mean()}, index=future_dates)
new_df.iloc[:df_train.shape[0], new_df.columns.get_loc('X1')] = \
df_train['X1'].values
new_df.index = new_df.index.strftime('%d-%m-%Y')
print(new_df)
答案 1 :(得分:0)
set_index()
的现有行concat()
他们import io
df_train = pd.read_csv(io.StringIO(""" X1
01-01-2020 | 1
01-02-2020 | 2
01-03-2020 | 3
01-04-2020 | 4 """), sep="|")
df_train = df_train.set_index(pd.to_datetime(df_train.index, format="%d-%m-%Y "))
df_train.columns = [c.strip() for c in df_train.columns]
future_dates = pd.date_range(df_train.index.max(), periods=12, freq='M')
pd.concat([
df_train.set_index(future_dates[0:len(df_train)]),
pd.DataFrame(index=future_dates[len(df_train):]).assign(X1=df_train["X1"].mean())
])
答案 2 :(得分:0)
这是另一种方式:
future_dates = pd.date_range(df.index.max(), periods=12, freq='M') + pd.tseries.offsets.MonthBegin()
df2 = pd.DataFrame(index = future_dates).assign(X1 = pd.Series(df['X1'].to_numpy(),index=future_dates[0:4])).fillna(df.mean())