按时间段迭代数据帧的块

时间:2015-08-18 00:42:01

标签: python pandas

我有一个按时间索引的pandas数据框

>>> df
                   A         B         C         D
2000-01-03  1.991135  0.045306 -0.657898  0.311375
2000-01-04  0.690848  1.862244  0.709432 -2.080355
2000-01-05  0.602610 -0.205035  1.248848  0.192274
2000-01-06 -0.646513 -0.170194  0.365317  0.121467
2000-01-07  0.461580  0.259200  0.734326  1.885612
2000-01-10 -1.277500  0.840206 -0.570010  0.155367
...

我希望使用排序索引按日期时间段有效地对此数据帧进行分区。我想要一个较小数据帧的迭代器

seq = partition_all(df, freq='1M')

>>> next(seq)
               A         B         C         D
2000-01-03  1.991135  0.045306 -0.657898  0.311375
2000-01-04  0.690848  1.862244  0.709432 -2.080355
2000-01-05  0.602610 -0.205035  1.248848  0.192274
...
>>> next(seq)
               A         B         C         D
2000-02-01 -0.108412  0.188484 -0.568542  0.335969
2000-02-02  0.855690 -0.283225  1.471867  0.309235
2000-02-03 -0.266090  0.684080  0.187856  1.734062
...

1 个答案:

答案 0 :(得分:2)

您可以使用TimeGrouper来组合月份:

In [11]: df
Out[11]:
                   A         B         C         D
2000-01-03  1.991135  0.045306 -0.657898  0.311375
2000-01-04  0.690848  1.862244  0.709432 -2.080355
2000-01-05  0.602610 -0.205035  1.248848  0.192274
2000-02-01 -0.108412  0.188484 -0.568542  0.335969
2000-02-02  0.855690 -0.283225  1.471867  0.309235
2000-02-03 -0.266090  0.684080  0.187856  1.734062

In [12]: g = df.groupby(pd.TimeGrouper("M"))

现在您可以每个月迭代一次GroupBy:

In [13]: for (month_start, sub_df) in g:
   ....:     print(sub_df)
   ....:
                   A         B         C         D
2000-01-03  1.991135  0.045306 -0.657898  0.311375
2000-01-04  0.690848  1.862244  0.709432 -2.080355
2000-01-05  0.602610 -0.205035  1.248848  0.192274
                   A         B         C         D
2000-02-01 -0.108412  0.188484 -0.568542  0.335969
2000-02-02  0.855690 -0.283225  1.471867  0.309235
2000-02-03 -0.266090  0.684080  0.187856  1.734062