我有以下类型的pandas.DataFrame:
sales_with_missing = pd.DataFrame({'month':[1,2,3,6,7,8,9,10,11,12],'code':[111]*10, 'sales':[np.random.randint(1500) for _ in np.arange(10)]})
您可以看到4月和5月的记录丢失,并且我希望将这些缺失记录的销售额列为零:
sales = insert_zero_for_missing(sales_with_missing)
print(sales)
如何实施insert_zero_for_missing
方法?
答案 0 :(得分:5)
month
设为索引reindex
为缺少的月份添加行fillna
以零填充缺失值,然后month
再次成为列):import numpy as np
import pandas as pd
month = list(range(1,4)) + list(range(6,13))
sales = np.array(month)*100
df = pd.DataFrame(dict(month=month, sales=sales))
print(df.set_index('month').reindex(range(1,13)).fillna(0).reset_index())
产量
month sales
0 1 100
1 2 200
2 3 300
3 4 0
4 5 0
5 6 600
6 7 700
7 8 800
8 9 900
9 10 1000
10 11 1100
11 12 1200