为熊猫中的每个组插入特定日期范围的行

时间:2019-12-24 01:59:54

标签: python-3.x pandas dataframe datetime

如何为每个组citydistrict的未来两个月行插入以下数据框?

  city district                     date  price
0    a        c  2019-08-01 00:00:00.000     12
1    a        c  2019-09-01 00:00:00.000     13
2    a        c  2019-10-01 00:00:00.000     11
3    a        c  2019-11-01 00:00:00.000     15
4    b        d  2019-08-01 00:00:00.000      8
5    b        d  2019-09-01 00:00:00.000      6
6    b        d  2019-10-01 00:00:00.000      9
7    b        d  2019-11-01 00:00:00.000     15

所需的输出将是这样。

  city district                     date  price
0    a        c  2019-08-01 00:00:00.000     12
1    a        c  2019-09-01 00:00:00.000     13
2    a        c  2019-10-01 00:00:00.000     11
3    a        c  2019-11-01 00:00:00.000     15
4    a        c  2019-12-01 00:00:00.000      
5    a        c  2020-01-01 00:00:00.000      
6    b        d  2019-08-01 00:00:00.000      8
7    b        d  2019-09-01 00:00:00.000      6
8    b        d  2019-10-01 00:00:00.000      9
9    b        d  2019-11-01 00:00:00.000     15
10   b        d  2019-12-01 00:00:00.000      
11   b        d  2020-01-01 00:00:00.000     

2 个答案:

答案 0 :(得分:1)

set_indexdate,然后reindex按频率MS

print (df.set_index("date").groupby(["city","district"])
       .apply(lambda d: d[["price"]].reindex(pd.date_range(min(df["date"]),max(df["date"])+pd.DateOffset(months=2),freq="MS")))
       .reset_index())

或通过MultiIndexcitydistrict的组合创建date

month_range = pd.date_range(min(df["date"]),max(df["date"])+pd.DateOffset(months=2),freq="MS")

combos = [(*k,d) for k in df.groupby(["city","district"]).groups.keys() for d in month_range ]

m_index = pd.MultiIndex.from_tuples(combos,names=["city","district","date"])

print (df.set_index(["city","district","date"]).reindex(m_index).reset_index())

两者的结果相同:

   city district    level_2  price
0     a        c 2019-08-01   12.0
1     a        c 2019-09-01   13.0
2     a        c 2019-10-01   11.0
3     a        c 2019-11-01   15.0
4     a        c 2019-12-01    NaN
5     a        c 2020-01-01    NaN
6     b        d 2019-08-01    8.0
7     b        d 2019-09-01    6.0
8     b        d 2019-10-01    9.0
9     b        d 2019-11-01   15.0
10    b        d 2019-12-01    NaN
11    b        d 2020-01-01    NaN

答案 1 :(得分:1)

如果只需要添加具有特定日期的行,则下面的代码也应该可以使用

distinct_group = df[["city","district"]].drop_duplicates().values.tolist()

new_date_range = pd.date_range(start='2019-12-01', periods=2 , freq='MS')

new_df = pd.DataFrame([ i + [j] for i in distinct_group for j in new_date_range], columns=['city','district','date'])

required_df = df.append(new_df)