Question

我有一个看起来像这样的数据框：

ID  Date       Category  Parameter  Color
1a1 2020-03-02    1          1       Red
1a1 2020-03-02    1          2       Green 
1a1 2020-03-02    2          1       Red
1a1 2020-03-03    2          2       Green
1a1 2020-03-03    3          1       Red
1a1 2020-03-03    3          2       Green   
1a2 2020-03-02    1          1       Red
1a2 2020-03-02
1a2 2020-03-02

对于给定的日期，我想知道 PER ID 有多少类别和参数被标记为红色，所以它将变成这样：

ID  Date       Category  Parameter  Color   count_red_category   count_red_parameter
1a1 2020-03-02    1          1       Red          1                     1
1a1 2020-03-02    1          2       Green        1                     1
1a1 2020-03-02    1          2       Red          1                     2
1a1 2020-03-02    2          1       Red          2                     2
1a1 2020-03-03    2          2       Green        0                     0
1a1 2020-03-03    3          1       Red          1                     1
1a1 2020-03-03    3          2       Green        1                     1   
1a2 2020-03-02    1          1       Red          1                     1
1a2 2020-03-02    1          1       Red          1                     1

基本上：

在每个日期时间，类别和参数都标记为红色/绿色。
每个类别可以有多个参数
对于每个日期时间，我想要到那个时间为止的不同类别数量（该ID有多少个不同类别，日期标记为红色）
参数相同

有什么主意吗？

致谢

Answer 1

我可能会误会，但是首先，您只关心红色值：

 tmpdf = df[df.Color=="Red"]

然后，您要按ID，日期分组，并找到不同类别的数量：

 tmpdf.groupby(['ID', 'Date']).Category.nunique()

当然，您可以将这两行结合起来：

 newdf=df[df.Color=="Red"].groupby(['ID', 'Date']).Category.nunique()

如果您想保留没有红色的日期/ id（给它们0），那么：

finaldf=newdf.set_index(df.groupby(['ID','Date']).Category.count().index).fillna(value=0).

基于列值的分组依据和总和

1 个答案: