在Featuretools中的多个训练窗口上计算特征

时间:2018-08-15 19:35:46

标签: feature-extraction featuretools

我有一张有关客户和交易的表格。有没有办法获取过去3/6/9/12个月内要过滤的功能?我想自动生成功能:

  • 最近3个月的跨性别数量
  • ....
  • 最近12个月的跨性别数量
  • 最近3个月的平均交易量
  • ...
  • 最近12个月的平均交易次数

我尝试使用training_window =["1 month", "3 months"],,但似乎没有为每个窗口返回多个功能。

示例:

import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)

window_features = ft.dfs(entityset=es,
   target_entity="customers",
   training_window=["1 hour", "1 day"],
   features_only = True)

window_features

我必须分别制作单个窗口,然后合并结果吗?

1 个答案:

答案 0 :(得分:2)

如前所述,在Featuretools 0.2.1中,您必须为每个训练窗口分别构建特征矩阵,然后合并结果。以您的示例为例,您将执行以下操作:

 LinearLayout linrtl=(LinearLayout)findViewById(R.id.linrtl);
 linrtl.setLayoutDirection(View.LAYOUT_DIRECTION_RTL);

然后,新数据框将如下所示:

import pandas as pd
import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)
cutoff_times = pd.DataFrame({"customer_id": [1, 2, 3, 4, 5],
                             "time": pd.date_range('2014-01-01 01:41:50', periods=5, freq='25min')})
features = ft.dfs(entityset=es,
                  target_entity="customers",
                  agg_primitives=['count'],
                  trans_primitives=[],
                  features_only = True)
fm_1 = ft.calculate_feature_matrix(features, 
                                   entityset=es, 
                                   cutoff_time=cutoff_times,
                                   training_window='1h', 
                                   verbose=True)

fm_2 = ft.calculate_feature_matrix(features, 
                                   entityset=es, 
                                   cutoff_time=cutoff_times,
                                   training_window='1d', 
                                   verbose=True)
new_df = fm_1.reset_index()
new_df = new_df.merge(fm_2.reset_index(), on="customer_id", suffixes=("_1h", "_1d"))