如何为每个组city
和district
的未来两个月行插入以下数据框?
city district date price
0 a c 2019-08-01 00:00:00.000 12
1 a c 2019-09-01 00:00:00.000 13
2 a c 2019-10-01 00:00:00.000 11
3 a c 2019-11-01 00:00:00.000 15
4 b d 2019-08-01 00:00:00.000 8
5 b d 2019-09-01 00:00:00.000 6
6 b d 2019-10-01 00:00:00.000 9
7 b d 2019-11-01 00:00:00.000 15
所需的输出将是这样。
city district date price
0 a c 2019-08-01 00:00:00.000 12
1 a c 2019-09-01 00:00:00.000 13
2 a c 2019-10-01 00:00:00.000 11
3 a c 2019-11-01 00:00:00.000 15
4 a c 2019-12-01 00:00:00.000
5 a c 2020-01-01 00:00:00.000
6 b d 2019-08-01 00:00:00.000 8
7 b d 2019-09-01 00:00:00.000 6
8 b d 2019-10-01 00:00:00.000 9
9 b d 2019-11-01 00:00:00.000 15
10 b d 2019-12-01 00:00:00.000
11 b d 2020-01-01 00:00:00.000
答案 0 :(得分:1)
set_index
至date
,然后reindex
按频率MS
:
print (df.set_index("date").groupby(["city","district"])
.apply(lambda d: d[["price"]].reindex(pd.date_range(min(df["date"]),max(df["date"])+pd.DateOffset(months=2),freq="MS")))
.reset_index())
或通过MultiIndex
,city
和district
的组合创建date
:
month_range = pd.date_range(min(df["date"]),max(df["date"])+pd.DateOffset(months=2),freq="MS")
combos = [(*k,d) for k in df.groupby(["city","district"]).groups.keys() for d in month_range ]
m_index = pd.MultiIndex.from_tuples(combos,names=["city","district","date"])
print (df.set_index(["city","district","date"]).reindex(m_index).reset_index())
两者的结果相同:
city district level_2 price
0 a c 2019-08-01 12.0
1 a c 2019-09-01 13.0
2 a c 2019-10-01 11.0
3 a c 2019-11-01 15.0
4 a c 2019-12-01 NaN
5 a c 2020-01-01 NaN
6 b d 2019-08-01 8.0
7 b d 2019-09-01 6.0
8 b d 2019-10-01 9.0
9 b d 2019-11-01 15.0
10 b d 2019-12-01 NaN
11 b d 2020-01-01 NaN
答案 1 :(得分:1)
如果只需要添加具有特定日期的行,则下面的代码也应该可以使用
distinct_group = df[["city","district"]].drop_duplicates().values.tolist()
new_date_range = pd.date_range(start='2019-12-01', periods=2 , freq='MS')
new_df = pd.DataFrame([ i + [j] for i in distinct_group for j in new_date_range], columns=['city','district','date'])
required_df = df.append(new_df)