Question

我有一些看似

的消费者购买数据

CustomerID  InvoiceDate
13654.0     2011-07-17 13:29:00
14841.0     2010-12-16 10:28:00
19543.0     2011-10-18 16:58:00
12877.0     2011-06-15 13:34:00
15073.0     2011-06-06 12:33:00

我对客户购买的价格感兴趣。我想按每个客户进行分组，然后确定每个季度的购买量（假设每个季度从1月开始每3个月一次）。

我可以定义每个季度的开始和结束时间并创建另一列。我想知道我是否可以使用groupby来实现同样的目标。

目前，我就是这样做的：

r = data.groupby('CustomerID')

frames = []
for name,frame in r:

    f =frame.set_index('InvoiceDate').resample("QS").count()

    f['CustomerID']= name

    frames.append(f)


g = pd.concat(frames)

Answer 1

<强>更新

In [43]: df.groupby(['CustomerID', pd.Grouper(key='InvoiceDate', freq='QS')]) \
           .size() \
           .reset_index(name='Count')
Out[43]:
   CustomerID InvoiceDate  Count
0     12877.0  2011-04-01      1
1     13654.0  2011-07-01      1
2     14841.0  2010-10-01      1
3     15073.0  2011-04-01      1
4     19543.0  2011-10-01      1

这就是你想要的吗？

In [39]: df.groupby(pd.Grouper(key='InvoiceDate', freq='QS')).count()
Out[39]:
             CustomerID
InvoiceDate
2010-10-01            1
2011-01-01            0
2011-04-01            2
2011-07-01            1
2011-10-01            1

Answer 2

我认为这是我能做的最好的事情：

data.groupby('CustomerID').apply(lambda x: x.set_index('InvoiceDate').resample('QS').count())

Answer 3

使用pd.TimeGrouper

df = df.set_index('InvoiceDate')
df.index = pd.to_datetime(df.index)
df.groupby(['CustomerID',pd.TimeGrouper(freq='QS')]).size().reset_index().rename(columns={0:'Num_Invoices'})

CustomerID InvoiceDate  Num_Invoices
0     12877.0  2011-04-01      1
1     13654.0  2011-07-01      1
2     14841.0  2010-10-01      1
3     15073.0  2011-04-01      1
4     19543.0  2011-10-01      1

我可以按列分组并重新采样日期吗？

3 个答案: