Question

让我们看看一些一分钟的数据：

<div ng-controller="userDataController">
  <birth-date birth-day="dateObject">
    <birth-year max-year="2016" range="100" year-label="Year"></birth-year>
      <birth-month month-label="Month"></birth-month>
      <birth-day day-label="Day"></birth-day>
  </birth-date>
</div>

假设您想通过拍摄将这些数据聚合成五分钟的块或条形图每组的总和：

In [513]: rng = pd.date_range('1/1/2000', periods=12, freq='T')
In [514]: ts = Series(np.arange(12), index=rng)
In [515]: ts
Out[515]:
2000-01-01 00:00:00      0
2000-01-01 00:01:00      1
2000-01-01 00:02:00      2
2000-01-01 00:03:00      3
2000-01-01 00:04:00      4
2000-01-01 00:05:00      5
2000-01-01 00:06:00      6
2000-01-01 00:07:00      7
2000-01-01 00:08:00      8
2000-01-01 00:09:00      9
2000-01-01 00:10:00      10
2000-01-01 00:11:00      11
Freq: T

但是我不想使用In [516]: ts.resample('5min', how='sum') Out[516]: 2000-01-01 00:00:00 0 2000-01-01 00:05:00 15 2000-01-01 00:10:00 40 2000-01-01 00:15:00 11 Freq: 5T方法，仍然需要相同的输入输出。如何使用resample或group_by或任何此类其他方法？

Answer 1

您可以这样使用自定义pd.Grouper：

 In [78]: ts.groupby(pd.Grouper(freq='5min', closed='right')).sum()
Out [78]:
1999-12-31 23:55:00     0
2000-01-01 00:00:00    15
2000-01-01 00:05:00    40
2000-01-01 00:10:00    11
Freq: 5T, dtype: int64

closed='right'确保输出完全相同。

但是，如果您的目标是进行更多自定义分组，则可以将.groupby与您自己的矢量一起使用：

 In [78]: buckets = (ts.index - ts.index[0]) / pd.Timedelta('5min')
 In [79]: grp = ts.groupby(np.ceil(buckets.values))

 In [80]: grp.sum()
Out[80]:
0     0
1    15
2    40
3    11
dtype: int64

输出不完全相同，但方法更灵活（例如可以创建不均匀的存储桶）。

在熊猫中使用下采样的另一种方法

1 个答案: