熊猫填写丢失的位置并计数

时间:2018-12-20 16:54:24

标签: python pandas

我有一个看起来像下面的数据框,其中有四个位置:

df
    Date        Location Count
0   2018-11-20  loc 1    22
1   2018-11-20  loc 2    1 
2   2018-11-20  loc 3    5
3   2018-11-20  loc 4    34
4   2018-11-21  loc 1    20
5   2018-11-21  loc 2    2
6   2018-11-22  loc 1    20
7   2018-11-23  loc 3    3
8   2018-11-23  loc 4    21

我希望用0填充缺失的位置,这样看起来像:

df
    Date        Location Count
0   2018-11-20  loc 1    22
1   2018-11-20  loc 2    1 
2   2018-11-20  loc 3    5
3   2018-11-20  loc 4    34
4   2018-11-21  loc 1    20
5   2018-11-21  loc 2    2
6   2018-11-21  loc 3    0
7   2018-11-21  loc 4    0
8   2018-11-22  loc 1    20
9   2018-11-22  loc 2    0
10  2018-11-22  loc 3    0
11  2018-11-22  loc 4    0
12  2018-11-23  loc 1    0
13  2018-11-23  loc 2    0
14  2018-11-23  loc 3    3
15  2018-11-23  loc 4    21

日期存储一个字符串。做这个的最好方式是什么?我应该先转换日期然后再应用函数吗?

3 个答案:

答案 0 :(得分:3)

您可以将pivotstack一起使用

df = df.pivot(*df.columns).fillna(0).stack().reset_index().rename(columns={0:'Count'})
df
Out[60]: 
          Date Location  Count
0   2018-11-20     loc1   22.0
1   2018-11-20     loc2    1.0
2   2018-11-20     loc3    5.0
3   2018-11-20     loc4   34.0
4   2018-11-21     loc1   20.0
5   2018-11-21     loc2    2.0
6   2018-11-21     loc3    0.0
7   2018-11-21     loc4    0.0
8   2018-11-22     loc1   20.0
9   2018-11-22     loc2    0.0
10  2018-11-22     loc3    0.0
11  2018-11-22     loc4    0.0
12  2018-11-23     loc1    0.0
13  2018-11-23     loc2    0.0
14  2018-11-23     loc3    3.0
15  2018-11-23     loc4   21.0

答案 1 :(得分:2)

使用groupbyunstackstack

(df.groupby(['Date', 'Location'])
   .Count
   .first()
   .unstack(1, fill_value=0)
   .stack(dropna=False)
   .reset_index(name='Count'))

          Date Location  Count
0   2018-11-20    loc 1     22
1   2018-11-20    loc 2      1
2   2018-11-20    loc 3      5
3   2018-11-20    loc 4     34
4   2018-11-21    loc 1     20
5   2018-11-21    loc 2      2
6   2018-11-21    loc 3      0
7   2018-11-21    loc 4      0
8   2018-11-22    loc 1     20
9   2018-11-22    loc 2      0
10  2018-11-22    loc 3      0
11  2018-11-22    loc 4      0
12  2018-11-23    loc 1      0
13  2018-11-23    loc 2      0
14  2018-11-23    loc 3      3
15  2018-11-23    loc 4     21

答案 2 :(得分:1)

您可以使用pd.MultiIndex.from_product来计算笛卡尔积:

# convert series types for performance
df['Date'] = pd.to_datetime(df['Date'])
df['Location'] = df['Location'].astype('category')

# calculate new index from Cartesian product
cols = ['Date', 'Location']
idx = pd.MultiIndex.from_product([df[col].unique() for col in cols], names=cols)

# set index, reindex, then reset index
df = df.set_index(cols).reindex(idx, fill_value=0).reset_index()

print(df)

         Date Location  Count
0  2018-11-20     loc1     22
1  2018-11-20     loc2      1
2  2018-11-20     loc3      5
3  2018-11-20     loc4     34
4  2018-11-21     loc1     20
5  2018-11-21     loc2      2
6  2018-11-21     loc3      0
7  2018-11-21     loc4      0
8  2018-11-22     loc1     20
9  2018-11-22     loc2      0
10 2018-11-22     loc3      0
11 2018-11-22     loc4     21
12 2018-11-23     loc1      0
13 2018-11-23     loc2      0
14 2018-11-23     loc3      3
15 2018-11-23     loc4      0