Question

我有一个如下所示的df，它是在一个区域中意外注册的。

Sector   RaisedDate   Inspector_ID    Priority  
SE1      02-Aug-2019  ID1             High
SE2      04-Aug-2019  ID1             Low
SE2      06-Aug-2019  ID2             Medium
SE1      12-Aug-2019  ID1             High
SE2      11-Aug-2019  ID1             Low
SE1      13-Aug-2019  ID2             High
SE1      18-Aug-2019  ID1             Medium
SE2      21-Aug-2019  ID1             Medium
SE2      20-Aug-2019  ID2             High
SE1      23-Aug-2019  ID1             High
SE1      25-Aug-2019  ID1             Low
SE2      29-Aug-2019  ID2             High
SE1      25-Aug-2019  ID1             Low
SE1      25-Aug-2019  ID2             High

从上面我想在下面的数据帧中准备

Sector  #_Week1  #_Week2  #_Week3  #_Week4   #_Week5   No_of_High   No_of_low
SE1     1        2        1        4         0         5            2
SE2     2        1        2        0         1         2            2

其中＃_Week1 =在第1周（2019年8月1日至2019年8月7日包括在内）记录的事故数

#_ Week2 =第2周（2019年8月8日至2019年8月14日包括在内）记录的事故数量

#_ Week3 =在第3周（2019年8月15日至2019年8月21日包括）记录的事故数量

#_ Week4 =在第4周（2019年8月22日至2019年8月28日，包括首尾两天）登记的事故数量

#_ Week5 =在第3周（2019年8月29日至2019年8月31日，包括首尾两天）登记的事故数量

No_of_High =所有数据在该扇区中的高优先级事故总数。

No_of_Low =该扇区中所有数据的低优先级事故总数。

为此，我尝试了以下代码，但没有用

df.set_index('RaisedDate').groupby(pd.Grouper(freq='Weekly')).Sector.count()

Answer 1

使用@Parth所说的内容，并将"Sector"添加到groupby()：

print(df.set_index('RaisedDate').groupby([
    'Sector',
    pd.Grouper(freq='7D'),
]).Sector.count().unstack())

RaisedDate  2019-08-02  2019-08-09  2019-08-16  2019-08-23
Sector                                                    
SE1                  1           2           1           4
SE2                  2           1           2           1

使您更接近所需的内容。然后，您可以重命名列以匹配您的输出。

我还注意到在第4周中我有值4和1，而没有第5周。不确定这是否对您有问题？

要在“高/低”属性列上添加，可以将具有不同组的新数据框加入。

# store the weekly groups
date = df.groupby([
    'Sector',
    pd.Grouper(key='RaisedDate', freq='7D')
]).Sector.count().unstack()


# rename columns
date.columns = [f'week{i}' for i in range(1, len(date.columns)+1)]

# store the priority groups
prio = (df.groupby([
    'Sector',
    'Priority'
]).Priority.count().unstack().drop(columns=[
    'Medium',
]))

# join them
print(date.join(prio))

        week1  week2  week3  week4  High  Low
Sector                                       
SE1         1      2      1      4     5    2
SE2         2      1      2      1     2    2

GroupBy每周依靠熊猫和其他列

1 个答案: