Featuretools支持已经处理多个截止时间https://docs.featuretools.com/automated_feature_engineering/handling_time.html
In [20]: temporal_cutoffs = ft.make_temporal_cutoffs(cutoffs['customer_id'],
....: cutoffs['cutoff_time'],
....: window_size='3d',
....: num_windows=2)
....:
In [21]: temporal_cutoffs
Out[21]:
time instance_id
0 2011-12-12 13458
1 2011-12-15 13458
2 2012-10-02 13602
3 2012-10-05 13602
4 2012-01-22 15222
5 2012-01-25 15222
In [22]: entityset = ft.demo.load_retail()
In [23]: feature_tensor, feature_defs = ft.dfs(entityset=entityset,
....: target_entity='customers',
....: cutoff_time=temporal_cutoffs,
....: cutoff_time_in_index=True,
....: max_features=4)
....:
In [24]: feature_tensor
Out[24]:
MAX(order_products.total) MIN(order_products.unit_price) STD(order_products.quantity) COUNT(order_products)
customer_id time
13458.0 2011-12-12 201.960 0.3135 10.053804 394
2011-12-15 201.960 0.3135 10.053804 394
15222.0 2012-01-22 272.250 1.1880 26.832816 5
2012-01-25 272.250 1.1880 26.832816 5
13602.0 2012-10-02 49.896 1.0395 8.732068 23
2012-10-05 49.896 1.0395 8.732068 23
但是,正如您所看到的,对于一个ID会在多个时间点生成一个熊猫多索引。我该如何(也许通过枢轴操作)代替所有以last / x_days_MIN / MAX / ...开头的MIN / MAX / ...生成的列,以便在每个截止窗口处获得更多功能?
initial feature 1,initial feature 2, time_frame_1_<AGGTYPE2>_Feature,time_frame_1_<AGGTYPE1>_Feature,time_frame_2_<AGGTYPE1>_Feature,time_frame_2_<AGGTYPE2>_Feature,time_frame_2_<AGGTYPE1>_Feature,time_frame_2_<AGGTYPE1>_Feature
答案 0 :(得分:3)
您可以通过使用不同的ft.calculate_feature_matrix
两次调用training_windows
并将结果特征矩阵合并在一起来实现。例如,
import featuretools as ft
import pandas as pd
entityset = ft.demo.load_retail()
cutoffs = pd.DataFrame({
'customer_name': ["Micheal Nicholson", "Krista Maddox"],
'cutoff_time': [pd.Timestamp('2011-10-14'), pd.Timestamp('2011-08-18')]
})
feature_defs = ft.dfs(entityset=entityset,
target_entity='customers',
agg_primitives=["sum"],
trans_primitives=[],
max_features=1,
features_only=True)
fm_60_days = ft.calculate_feature_matrix(entityset=entityset,
features=feature_defs,
cutoff_time=cutoffs,
training_window="60 days")
fm_30_days = ft.calculate_feature_matrix(entityset=entityset,
features=feature_defs,
cutoff_time=cutoffs,
training_window="30 days")
fm_60_days.merge(fm_30_days, left_index=True, right_index=True, suffixes=("__60_days", "__30_days"))
上面的代码返回此DataFrame,其中我们具有使用最近60天和30天的数据进行计算得出的相同功能。
SUM(order_products.quantity)__60_days SUM(order_products.quantity)__30_days
customer_name
Krista Maddox 466 306
Micheal Nicholson 710 539
注意:此示例在Featuretools的最新版本(v0.3.1)上运行,在该版本中我们更新了演示零售数据集,以将可解释的名称作为客户ID。