按日期和类别分组的计数器功能

时间:2019-04-23 00:33:41

标签: python pandas

我有一个逐行项目列表,其中包含开始日期和结束日期。我想在日期介于开始日期和结束日期(包括结束日期)之间时对该项进行计数,按类别分组。

这是我的输入数据集:

>> df
Key     Count Start     Count End     Category
A       Jan 1 2019      Jan 5 2019    Red
B       Jan 1 2019      Jan 7 2019    Blue
C       Jan 3 2019      Jan 5 2019    Red
D       Jan 2 2019      Jan 8 2019    Red
E       Jan 4 2019      Jan 10 2019   Yellow
F       Jan 3 2019      Jan 6 2019    Blue
G       Jan 5 2019      Jan 8 2019    Red
H       Jan 6 2019      Jan 10 2019   Yellow
I       Jan 1 2019      Jan 4 2019    Yellow
J       Jan 2 2019      Jan 7 2019    Red

我希望我的输出数据集是这样的:

>> DailyCount
Date          Category          Count
Jan 1 2019    Red               1
Jan 1 2019    Blue              1
Jan 1 2019    Yellow            1
Jan 2 2019    Red               3
Jan 2 2019    Blue              1
Jan 2 2019    Yellow            1
Jan 3 2019    Red               4
Jan 3 2019    Blue              2
Jan 3 2019    Yellow            1
Jan 4 2019    Red               4
Jan 4 2019    Blue              2
Jan 4 2019    Yellow            2
Jan 5 2019    Red               5
Jan 5 2019    Blue              2
Jan 5 2019    Yellow            1
Jan 6 2019    Red               3
Jan 6 2019    Blue              2
Jan 6 2019    Yellow            2
Jan 7 2019    Red               3
Jan 7 2019    Blue              1
Jan 7 2019    Yellow            2
Jan 8 2019    Red               2
Jan 8 2019    Blue              0
Jan 8 2019    Yellow            2
Jan 9 2019    Red               0
Jan 9 2019    Blue              0
Jan 9 2019    Yellow            2
Jan 10 2019   Red               0
Jan 10 2019   Blue              0
Jan 10 2019   Yellow            2

我使用Counter()来计算每天的发生次数,但是我不确定如何合并按类别进行分组:

Count = Counter()

for index, row in df.iterrows():
  delta = row['Count End'] - row['Count Start']
  for i in range(delta.days + 1):
    time = row['Count Start'] + timedelta(i)
    Count[str(time.date())] += 1

DailyCount = DataFrame.from_dict(Count,orient='index').reset_index().rename(columns={'index':'Date', 0:'Count'}).sort_values(by=['Date'])

>> DailyCount
Date          Count
Jan 1 2019    3
Jan 2 2019    5
Jan 3 2019    7
Jan 4 2019    8
Jan 5 2019    8
Jan 6 2019    7
Jan 7 2019    6
Jan 8 2019    4
Jan 9 2019    2
Jan 10 2019   2

有什么想法可以按类别对代码进行分区吗?

2 个答案:

答案 0 :(得分:0)

创建日期列表后使用unnesting

df['Count Start']=pd.to_datetime(df['Count Start'])
df['Count End']=pd.to_datetime(df['Count End'])

df['Date']=[pd.date_range(x,y) for x , y in zip(df['Count Start'],df['Count End'])]
#Here we just need combine with `groupby` with `size` and adjust by using `unstack` and `stack`
Yourdf=unnesting(df,['Date']).groupby(['Date','Category']).size().unstack(fill_value=0).stack()
Yourdf

         Date Category  0
0  2019-01-01     Blue  1
1  2019-01-01      Red  1
2  2019-01-01   Yellow  1
3  2019-01-02     Blue  1
4  2019-01-02      Red  3
5  2019-01-02   Yellow  1
6  2019-01-03     Blue  2
7  2019-01-03      Red  4
8  2019-01-03   Yellow  1
9  2019-01-04     Blue  2
10 2019-01-04      Red  4
11 2019-01-04   Yellow  2
12 2019-01-05     Blue  2
13 2019-01-05      Red  5
14 2019-01-05   Yellow  1
15 2019-01-06     Blue  2
16 2019-01-06      Red  3
17 2019-01-06   Yellow  2
18 2019-01-07     Blue  1
19 2019-01-07      Red  3
20 2019-01-07   Yellow  2
21 2019-01-08     Blue  0
22 2019-01-08      Red  2
23 2019-01-08   Yellow  2
24 2019-01-09     Blue  0
25 2019-01-09      Red  0
26 2019-01-09   Yellow  2
27 2019-01-10     Blue  0
28 2019-01-10      Red  0
29 2019-01-10   Yellow  2

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')

答案 1 :(得分:0)

您可以使用:

pandas.DataFrame.groupby()

功能,它应该为您工作。

您可以查看有关此功能的更多信息: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html