我的DataFrame中有多个列的DateTime索引。如图所示:
data_1 data_2
time
2020-01-01 00:23:40 330.98 NaN
2020-01-01 00:23:50 734.52 NaN
2020-01-03 00:00:00 388.06 23.9
2020-01-03 00:00:10 341.60 25.1
2020-01-03 00:00:20 395.14 24.9
...
2020-01-03 00:01:10 341.60 25.1
2020-01-03 00:01:20 395.14 24.9
我想在滚动窗口上应用一个功能(必须是日期时间,因为我可能丢失了数据,而this并不是我的情况)并收集了一些功能。功能取决于多列。 我写了自己的课:
class FeatureCollector:
def __init__(self):
self.feature_dicts = []
def collect(self, window):
self.feature_dicts.append(extract_features(window))
return 1
def extract_features(window):
ans = {}
# do_smth_on_window and calculate ans
return ans
我按照以下步骤操作
collector = FeatureCollector()
my_df.rolling(timed(seconds=100), min_periods=10).apply(collector.collect)
features = collector.feature_dicts
但是问题是,据我了解,extract_features可能仅获得Series对象。我的列data_1和data_2将依次传递到那里,因为它是这样的DataFrame:
data
time
2020-01-01 00:23:40 330.98
2020-01-01 00:23:50 734.52
2020-01-03 00:00:00 388.06
2020-01-03 00:00:10 341.60
2020-01-03 00:00:20 395.14
...
2020-01-03 00:01:10 341.60
2020-01-03 00:01:20 395.14
2020-01-01 00:23:40 NaN
2020-01-01 00:23:50 NaN
2020-01-03 00:00:00 23.9
2020-01-03 00:00:10 25.1
2020-01-03 00:00:20 24.9
...
2020-01-03 00:01:10 25.1
2020-01-03 00:01:20 24.9
如何组织它,以使传递给extract_features的一个窗口成为具有两列的DataFrame?