带有日期时间索引的多列熊猫滚动窗口

时间:2020-06-29 09:03:16

标签: python pandas datetime data-science rolling-computation

我的DataFrame中有多个列的DateTime索引。如图所示:

                     data_1   data_2
time                                      
2020-01-01 00:23:40  330.98      NaN
2020-01-01 00:23:50  734.52      NaN
2020-01-03 00:00:00  388.06     23.9
2020-01-03 00:00:10  341.60     25.1
2020-01-03 00:00:20  395.14     24.9
...
2020-01-03 00:01:10  341.60     25.1
2020-01-03 00:01:20  395.14     24.9

我想在滚动窗口上应用一个功能(必须是日期时间,因为我可能丢失了数据,而this并不是我的情况)并收集了一些功能。功能取决于多列。 我写了自己的课:

class FeatureCollector:
    def __init__(self):
        self.feature_dicts = []

    def collect(self, window):
        self.feature_dicts.append(extract_features(window))
        return 1

def extract_features(window):
    ans = {}
    # do_smth_on_window and calculate ans
    return ans

我按照以下步骤操作

collector = FeatureCollector()
my_df.rolling(timed(seconds=100), min_periods=10).apply(collector.collect)
features = collector.feature_dicts

但是问题是,据我了解,extract_features可能仅获得Series对象。我的列data_1和data_2将依次传递到那里,因为它是这样的DataFrame:

                       data
time                                      
2020-01-01 00:23:40  330.98
2020-01-01 00:23:50  734.52
2020-01-03 00:00:00  388.06
2020-01-03 00:00:10  341.60
2020-01-03 00:00:20  395.14
...
2020-01-03 00:01:10  341.60
2020-01-03 00:01:20  395.14                                 
2020-01-01 00:23:40     NaN
2020-01-01 00:23:50     NaN
2020-01-03 00:00:00    23.9
2020-01-03 00:00:10    25.1
2020-01-03 00:00:20    24.9
...
2020-01-03 00:01:10    25.1
2020-01-03 00:01:20    24.9

如何组织它,以使传递给extract_features的一个窗口成为具有两列的DataFrame?

0 个答案:

没有答案