如果在熊猫df中丢失,则创建行

时间:2020-07-17 01:16:49

标签: python python-3.x pandas dataframe

当前数据:

  |ID | DT     | STATE | V|
  |1  | 201901 | PA    | 1|
  |1  | 201902 | PA    | 6|
  |2  | 201902 | PA    | 3|
  |1  | 201902 | CA    | 3|
  |2  | 201901 | CA    | 1|

我想创建包含IDDTSTATE的所有组合的行,其中V为0,这样无法使用它:

  |ID | DT     | STATE | V|
  |1  | 201901 | PA    | 1|
  |1  | 201902 | PA    | 6|
  |1  | 201901 | CA    | 0|
  |1  | 201902 | CA    | 3|
  |2  | 201901 | PA    | 0|
  |2  | 201902 | PA    | 3|
  |2  | 201901 | CA    | 1|
  |2  | 201902 | CA    | 0|

谢谢

2 个答案:

答案 0 :(得分:2)

您可以先建立MultiIndex索引,然后reindex

idx=pd.MultiIndex.from_product([df.ID.unique(),df.DT.unique(),df.STATE.unique()])
df=df.set_index(['ID','DT','STATE']).reindex(idx,fill_value=0).reset_index()
df
   level_0  level_1 level_2  V
0        1   201901      PA  1
1        1   201901      CA  0
2        1   201902      PA  6
3        1   201902      CA  3
4        2   201901      PA  0
5        2   201901      CA  1
6        2   201902      PA  3
7        2   201902      CA  0

答案 1 :(得分:-1)

groupby的前三列,.reindex紧随其后,.sort_values根据需要。

输入:

    ID  DT  STATE   V
0   1   201901  PA  1
1   1   201902  PA  6
2   2   201902  PA  3
3   1   201902  CA  3
4   2   201901  CA  1

代码

i = [df['ID'].unique(), df['DT'].unique(), df['STATE'].unique()]
df = df.groupby(['ID', 'DT', 'STATE']).sum() \
   .reindex(index=pd.MultiIndex.from_product(i, names=['ID', 'DT', 'STATE']), fill_value=0) \
   .reset_index().sort_values(['ID', 'STATE', 'DT'], ascending=[True,False,True])
df

输出:

    ID  DT      STATE   V
0   1   201901  PA      1
8   1   201902  PA      6
2   1   201901  CA      0
10  1   201902  CA      3
256 2   201901  PA      0
264 2   201902  PA      3
258 2   201901  CA      1
266 2   201902  CA      0