当前数据:
|ID | DT | STATE | V|
|1 | 201901 | PA | 1|
|1 | 201902 | PA | 6|
|2 | 201902 | PA | 3|
|1 | 201902 | CA | 3|
|2 | 201901 | CA | 1|
我想创建包含ID
,DT
和STATE
的所有组合的行,其中V
为0,这样无法使用它:
|ID | DT | STATE | V|
|1 | 201901 | PA | 1|
|1 | 201902 | PA | 6|
|1 | 201901 | CA | 0|
|1 | 201902 | CA | 3|
|2 | 201901 | PA | 0|
|2 | 201902 | PA | 3|
|2 | 201901 | CA | 1|
|2 | 201902 | CA | 0|
谢谢
答案 0 :(得分:2)
您可以先建立MultiIndex
索引,然后reindex
idx=pd.MultiIndex.from_product([df.ID.unique(),df.DT.unique(),df.STATE.unique()])
df=df.set_index(['ID','DT','STATE']).reindex(idx,fill_value=0).reset_index()
df
level_0 level_1 level_2 V
0 1 201901 PA 1
1 1 201901 CA 0
2 1 201902 PA 6
3 1 201902 CA 3
4 2 201901 PA 0
5 2 201901 CA 1
6 2 201902 PA 3
7 2 201902 CA 0
答案 1 :(得分:-1)
groupby
的前三列,.reindex
紧随其后,.sort_values
根据需要。
输入:
ID DT STATE V
0 1 201901 PA 1
1 1 201902 PA 6
2 2 201902 PA 3
3 1 201902 CA 3
4 2 201901 CA 1
代码
i = [df['ID'].unique(), df['DT'].unique(), df['STATE'].unique()]
df = df.groupby(['ID', 'DT', 'STATE']).sum() \
.reindex(index=pd.MultiIndex.from_product(i, names=['ID', 'DT', 'STATE']), fill_value=0) \
.reset_index().sort_values(['ID', 'STATE', 'DT'], ascending=[True,False,True])
df
输出:
ID DT STATE V
0 1 201901 PA 1
8 1 201902 PA 6
2 1 201901 CA 0
10 1 201902 CA 3
256 2 201901 PA 0
264 2 201902 PA 3
258 2 201901 CA 1
266 2 201902 CA 0