Question

我有以下df DataFrame（pandas）：

           attribute
2017-01-01         a
2017-01-01         a
2017-01-05         b
2017-02-01         a
2017-02-10         a

其中第一列是非唯一datetime索引，我想每周计算a和b的数量。如果我尝试df.attribute.resample('W').count()，则会出现错误，因为重复的条目。

我该怎么做？

Answer 1

df=df.reset_index()    
df.groupby([df['index'].dt.week,'attribute']).count()
Out[292]: 
                 index
index attribute       
1     b              1
5     a              1
6     a              1
52    a              2

或者

df.groupby([df.index.get_level_values(0).week,'attribute'])['attribute'].count()

Out[303]: 
    attribute
1   b            1
5   a            1
6   a            1
52  a            2
Name: attribute, dtype: int64

Answer 2

您可能对涉及groupby后跟resample的两步流程感兴趣。

df.groupby(level=0).count().resample('W').sum()
            attribute
2017-01-01        2.0
2017-01-08        1.0
2017-01-15        NaN
2017-01-22        NaN
2017-01-29        NaN
2017-02-05        1.0
2017-02-12        1.0

Answer 3

您可以使用pd.Grouper按每周频率对索引进行分组：

In [83]: df.groupby(pd.Grouper(freq='W')).count()
Out[83]: 
            attribute
2017-01-01          2
2017-01-08          1
2017-01-15          0
2017-01-22          0
2017-01-29          0
2017-02-05          1
2017-02-12          1

要按每周频率和attribute列进行分组，您可以使用：

In [87]: df.groupby([pd.Grouper(freq='W'), 'attribute']).size()
Out[87]: 
            attribute
2017-01-01  a            2
2017-01-08  b            1
2017-02-05  a            1
2017-02-12  a            1
dtype: int64

pd.Grouper还有一个key参数，允许您按列中的日期时间而不是索引进行分组。

当DateTime索引不唯一且相应的值相同时重新采样

3 个答案: