我有一个按时间索引的pandas数据框
>>> df
A B C D
2000-01-03 1.991135 0.045306 -0.657898 0.311375
2000-01-04 0.690848 1.862244 0.709432 -2.080355
2000-01-05 0.602610 -0.205035 1.248848 0.192274
2000-01-06 -0.646513 -0.170194 0.365317 0.121467
2000-01-07 0.461580 0.259200 0.734326 1.885612
2000-01-10 -1.277500 0.840206 -0.570010 0.155367
...
我希望使用排序索引按日期时间段有效地对此数据帧进行分区。我想要一个较小数据帧的迭代器
seq = partition_all(df, freq='1M')
>>> next(seq)
A B C D
2000-01-03 1.991135 0.045306 -0.657898 0.311375
2000-01-04 0.690848 1.862244 0.709432 -2.080355
2000-01-05 0.602610 -0.205035 1.248848 0.192274
...
>>> next(seq)
A B C D
2000-02-01 -0.108412 0.188484 -0.568542 0.335969
2000-02-02 0.855690 -0.283225 1.471867 0.309235
2000-02-03 -0.266090 0.684080 0.187856 1.734062
...
答案 0 :(得分:2)
您可以使用TimeGrouper
来组合月份:
In [11]: df
Out[11]:
A B C D
2000-01-03 1.991135 0.045306 -0.657898 0.311375
2000-01-04 0.690848 1.862244 0.709432 -2.080355
2000-01-05 0.602610 -0.205035 1.248848 0.192274
2000-02-01 -0.108412 0.188484 -0.568542 0.335969
2000-02-02 0.855690 -0.283225 1.471867 0.309235
2000-02-03 -0.266090 0.684080 0.187856 1.734062
In [12]: g = df.groupby(pd.TimeGrouper("M"))
现在您可以每个月迭代一次GroupBy:
In [13]: for (month_start, sub_df) in g:
....: print(sub_df)
....:
A B C D
2000-01-03 1.991135 0.045306 -0.657898 0.311375
2000-01-04 0.690848 1.862244 0.709432 -2.080355
2000-01-05 0.602610 -0.205035 1.248848 0.192274
A B C D
2000-02-01 -0.108412 0.188484 -0.568542 0.335969
2000-02-02 0.855690 -0.283225 1.471867 0.309235
2000-02-03 -0.266090 0.684080 0.187856 1.734062