我有一个看起来像下面的数据框,其中有四个位置:
df
Date Location Count
0 2018-11-20 loc 1 22
1 2018-11-20 loc 2 1
2 2018-11-20 loc 3 5
3 2018-11-20 loc 4 34
4 2018-11-21 loc 1 20
5 2018-11-21 loc 2 2
6 2018-11-22 loc 1 20
7 2018-11-23 loc 3 3
8 2018-11-23 loc 4 21
我希望用0填充缺失的位置,这样看起来像:
df
Date Location Count
0 2018-11-20 loc 1 22
1 2018-11-20 loc 2 1
2 2018-11-20 loc 3 5
3 2018-11-20 loc 4 34
4 2018-11-21 loc 1 20
5 2018-11-21 loc 2 2
6 2018-11-21 loc 3 0
7 2018-11-21 loc 4 0
8 2018-11-22 loc 1 20
9 2018-11-22 loc 2 0
10 2018-11-22 loc 3 0
11 2018-11-22 loc 4 0
12 2018-11-23 loc 1 0
13 2018-11-23 loc 2 0
14 2018-11-23 loc 3 3
15 2018-11-23 loc 4 21
日期存储一个字符串。做这个的最好方式是什么?我应该先转换日期然后再应用函数吗?
答案 0 :(得分:3)
您可以将pivot
与stack
一起使用
df = df.pivot(*df.columns).fillna(0).stack().reset_index().rename(columns={0:'Count'})
df
Out[60]:
Date Location Count
0 2018-11-20 loc1 22.0
1 2018-11-20 loc2 1.0
2 2018-11-20 loc3 5.0
3 2018-11-20 loc4 34.0
4 2018-11-21 loc1 20.0
5 2018-11-21 loc2 2.0
6 2018-11-21 loc3 0.0
7 2018-11-21 loc4 0.0
8 2018-11-22 loc1 20.0
9 2018-11-22 loc2 0.0
10 2018-11-22 loc3 0.0
11 2018-11-22 loc4 0.0
12 2018-11-23 loc1 0.0
13 2018-11-23 loc2 0.0
14 2018-11-23 loc3 3.0
15 2018-11-23 loc4 21.0
答案 1 :(得分:2)
使用groupby
,unstack
和stack
:
(df.groupby(['Date', 'Location'])
.Count
.first()
.unstack(1, fill_value=0)
.stack(dropna=False)
.reset_index(name='Count'))
Date Location Count
0 2018-11-20 loc 1 22
1 2018-11-20 loc 2 1
2 2018-11-20 loc 3 5
3 2018-11-20 loc 4 34
4 2018-11-21 loc 1 20
5 2018-11-21 loc 2 2
6 2018-11-21 loc 3 0
7 2018-11-21 loc 4 0
8 2018-11-22 loc 1 20
9 2018-11-22 loc 2 0
10 2018-11-22 loc 3 0
11 2018-11-22 loc 4 0
12 2018-11-23 loc 1 0
13 2018-11-23 loc 2 0
14 2018-11-23 loc 3 3
15 2018-11-23 loc 4 21
答案 2 :(得分:1)
您可以使用pd.MultiIndex.from_product
来计算笛卡尔积:
# convert series types for performance
df['Date'] = pd.to_datetime(df['Date'])
df['Location'] = df['Location'].astype('category')
# calculate new index from Cartesian product
cols = ['Date', 'Location']
idx = pd.MultiIndex.from_product([df[col].unique() for col in cols], names=cols)
# set index, reindex, then reset index
df = df.set_index(cols).reindex(idx, fill_value=0).reset_index()
print(df)
Date Location Count
0 2018-11-20 loc1 22
1 2018-11-20 loc2 1
2 2018-11-20 loc3 5
3 2018-11-20 loc4 34
4 2018-11-21 loc1 20
5 2018-11-21 loc2 2
6 2018-11-21 loc3 0
7 2018-11-21 loc4 0
8 2018-11-22 loc1 20
9 2018-11-22 loc2 0
10 2018-11-22 loc3 0
11 2018-11-22 loc4 21
12 2018-11-23 loc1 0
13 2018-11-23 loc2 0
14 2018-11-23 loc3 3
15 2018-11-23 loc4 0