我有以下数据:
product Sales_band Hour_id sales
prod_1 HIGH 1 200
prod_1 HIGH 3 100
prod_1 HIGH 4 300
prod_1 VERY HIGH 2 100
prod_1 VERY HIGH 5 253
prod_1 VERY HIGH 6 234
想要根据 hour_id 值添加行。 hour_id变量可以取值 1到10 。因此,上面的相同数据将在缺少小时ID的地方展开。虚拟输出为:(销售= 0时缺少小时ID )
product Sales_band Hour_id sales
prod_1 HIGH 1 200
prod_1 HIGH 2 0
prod_1 HIGH 3 100
prod_1 HIGH 4 300
prod_1 HIGH 5 0
prod_1 HIGH 6 0
prod_1 HIGH 7 0
prod_1 HIGH 8 0
prod_1 HIGH 9 0
prod_1 HIGH 10 0
prod_1 VERY HIGH 1 0
prod_1 VERY HIGH 2 100
prod_1 VERY HIGH 3 0
prod_1 VERY HIGH 4 0
prod_1 VERY HIGH 5 253
prod_1 VERY HIGH 6 234
prod_1 VERY HIGH 7 0
prod_1 VERY HIGH 8 0
prod_1 VERY HIGH 9 0
prod_1 VERY HIGH 10 0
如何使用python数据框实现此目的。
答案 0 :(得分:2)
print (df.groupby(['product','Sales_band'])['Hour_id','sales']
.apply(lambda x: x.set_index('Hour_id').reindex(range(1, 11), fill_value=0))
.reset_index())
product Sales_band Hour_id sales
0 prod_1 HIGH 1 200
1 prod_1 HIGH 2 0
2 prod_1 HIGH 3 100
3 prod_1 HIGH 4 300
4 prod_1 HIGH 5 0
5 prod_1 HIGH 6 0
6 prod_1 HIGH 7 0
7 prod_1 HIGH 8 0
8 prod_1 HIGH 9 0
9 prod_1 HIGH 10 0
10 prod_1 VERY HIGH 1 0
11 prod_1 VERY HIGH 2 100
12 prod_1 VERY HIGH 3 0
13 prod_1 VERY HIGH 4 0
14 prod_1 VERY HIGH 5 253
15 prod_1 VERY HIGH 6 234
16 prod_1 VERY HIGH 7 0
17 prod_1 VERY HIGH 8 0
18 prod_1 VERY HIGH 9 0
19 prod_1 VERY HIGH 10 0