我有一个逐行项目列表,其中包含开始日期和结束日期。我想在日期介于开始日期和结束日期(包括结束日期)之间时对该项进行计数,按类别分组。
这是我的输入数据集:
>> df
Key Count Start Count End Category
A Jan 1 2019 Jan 5 2019 Red
B Jan 1 2019 Jan 7 2019 Blue
C Jan 3 2019 Jan 5 2019 Red
D Jan 2 2019 Jan 8 2019 Red
E Jan 4 2019 Jan 10 2019 Yellow
F Jan 3 2019 Jan 6 2019 Blue
G Jan 5 2019 Jan 8 2019 Red
H Jan 6 2019 Jan 10 2019 Yellow
I Jan 1 2019 Jan 4 2019 Yellow
J Jan 2 2019 Jan 7 2019 Red
我希望我的输出数据集是这样的:
>> DailyCount
Date Category Count
Jan 1 2019 Red 1
Jan 1 2019 Blue 1
Jan 1 2019 Yellow 1
Jan 2 2019 Red 3
Jan 2 2019 Blue 1
Jan 2 2019 Yellow 1
Jan 3 2019 Red 4
Jan 3 2019 Blue 2
Jan 3 2019 Yellow 1
Jan 4 2019 Red 4
Jan 4 2019 Blue 2
Jan 4 2019 Yellow 2
Jan 5 2019 Red 5
Jan 5 2019 Blue 2
Jan 5 2019 Yellow 1
Jan 6 2019 Red 3
Jan 6 2019 Blue 2
Jan 6 2019 Yellow 2
Jan 7 2019 Red 3
Jan 7 2019 Blue 1
Jan 7 2019 Yellow 2
Jan 8 2019 Red 2
Jan 8 2019 Blue 0
Jan 8 2019 Yellow 2
Jan 9 2019 Red 0
Jan 9 2019 Blue 0
Jan 9 2019 Yellow 2
Jan 10 2019 Red 0
Jan 10 2019 Blue 0
Jan 10 2019 Yellow 2
我使用Counter()
来计算每天的发生次数,但是我不确定如何合并按类别进行分组:
Count = Counter()
for index, row in df.iterrows():
delta = row['Count End'] - row['Count Start']
for i in range(delta.days + 1):
time = row['Count Start'] + timedelta(i)
Count[str(time.date())] += 1
DailyCount = DataFrame.from_dict(Count,orient='index').reset_index().rename(columns={'index':'Date', 0:'Count'}).sort_values(by=['Date'])
>> DailyCount
Date Count
Jan 1 2019 3
Jan 2 2019 5
Jan 3 2019 7
Jan 4 2019 8
Jan 5 2019 8
Jan 6 2019 7
Jan 7 2019 6
Jan 8 2019 4
Jan 9 2019 2
Jan 10 2019 2
有什么想法可以按类别对代码进行分区吗?
答案 0 :(得分:0)
创建日期列表后使用unnesting
df['Count Start']=pd.to_datetime(df['Count Start'])
df['Count End']=pd.to_datetime(df['Count End'])
df['Date']=[pd.date_range(x,y) for x , y in zip(df['Count Start'],df['Count End'])]
#Here we just need combine with `groupby` with `size` and adjust by using `unstack` and `stack`
Yourdf=unnesting(df,['Date']).groupby(['Date','Category']).size().unstack(fill_value=0).stack()
Yourdf
Date Category 0
0 2019-01-01 Blue 1
1 2019-01-01 Red 1
2 2019-01-01 Yellow 1
3 2019-01-02 Blue 1
4 2019-01-02 Red 3
5 2019-01-02 Yellow 1
6 2019-01-03 Blue 2
7 2019-01-03 Red 4
8 2019-01-03 Yellow 1
9 2019-01-04 Blue 2
10 2019-01-04 Red 4
11 2019-01-04 Yellow 2
12 2019-01-05 Blue 2
13 2019-01-05 Red 5
14 2019-01-05 Yellow 1
15 2019-01-06 Blue 2
16 2019-01-06 Red 3
17 2019-01-06 Yellow 2
18 2019-01-07 Blue 1
19 2019-01-07 Red 3
20 2019-01-07 Yellow 2
21 2019-01-08 Blue 0
22 2019-01-08 Red 2
23 2019-01-08 Yellow 2
24 2019-01-09 Blue 0
25 2019-01-09 Red 0
26 2019-01-09 Yellow 2
27 2019-01-10 Blue 0
28 2019-01-10 Red 0
29 2019-01-10 Yellow 2
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
答案 1 :(得分:0)
您可以使用:
pandas.DataFrame.groupby()
功能,它应该为您工作。
您可以查看有关此功能的更多信息: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html