如何将MultiplyNumeric用于转换为同一表格的权重和值的日期?

时间:2019-06-28 12:49:24

标签: python featuretools

我的主要目标是考虑具有更高价值的最新信息的功能。

因此,想法是通过一个新的原始转换“ WeightTimeUntil”计算一个加权因子,此后,转换原语“ MultiplyNumeric”可以使用该加权因子来获取加权值。

我将Will Koehrsen的演练walkthrough作为数据和实体设置的起点。

因此,我遇到了以下问题:

  1. 功能工具未选择我要实现的组合(请参见下文)
  2. 功能部件工具似乎因为类型不匹配而未选择组合?!
  3. 通过更改值的类型,我想将其乘以加权因子,我设法获得了正确的组合,但没有找到正确的目标
  4. 对于目标相等的客户端,featuretools根本没有选择我想要的组合。仅当我使用日期和值作为列的目标等额贷款时,featuretools才使用正确的组合

这是“ WeightTimeUntil”原语的代码

def weight_time_until(array, time):
    diff = pd.DatetimeIndex(array) - time
    s = np.floor(diff.days/365/0.5)
    aWidth = 9
    a = math.log(0.1) / ( -(aWidth -1) )

    w = np.exp(-a*s) 

    return w


    WeightTimeUntil = make_trans_primitive(function=weight_time_until,
                                 input_types=[Datetime],
                                 return_type=Numeric,
                                 uses_calc_time=True,
                                 description="Calculates weight time until the cutoff time",
                                 name="weight_time_until")

这是DFS执行代码:

features, feature_names = ft.dfs(entityset = es, target_entity = 'clients', 
                                 agg_primitives = ['sum'],
                                 trans_primitives = [WeightTimeUntil, MultiplyNumeric]) 

以及此处的功能列表:

 <Feature: income>,
 <Feature: credit_score>,
 <Feature: join_month>,
 <Feature: log_income>,
 <Feature: SUM(loans.loan_amount)>,
 <Feature: SUM(loans.rate)>,
 <Feature: SUM(payments.payment_amount)>,
 <Feature: WEIGHT_TIME_UNTIL(joined)>,
 <Feature: join_month * log_income>,
 <Feature: income * log_income>,
 <Feature: income * join_month>,
 <Feature: credit_score * join_month>,
 <Feature: credit_score * log_income>,
 <Feature: credit_score * income>,
 <Feature: SUM(loans.WEIGHT_TIME_UNTIL(loan_start))>,
 <Feature: SUM(loans.WEIGHT_TIME_UNTIL(loan_end))>,
 <Feature: SUM(loans.loan_amount * rate)>,
 <Feature: income * SUM(loans.loan_amount)>,
 <Feature: credit_score * SUM(loans.loan_amount)>,
 <Feature: log_income * SUM(payments.payment_amount)>,
 <Feature: log_income * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: income * SUM(payments.payment_amount)>,
 <Feature: join_month * SUM(loans.rate)>,
 <Feature: income * SUM(loans.rate)>,
 <Feature: join_month * SUM(loans.loan_amount)>,
 <Feature: SUM(loans.rate) * SUM(payments.payment_amount)>,
 <Feature: credit_score * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: SUM(loans.rate) * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: income * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: log_income * SUM(loans.loan_amount)>,
 <Feature: SUM(loans.loan_amount) * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: SUM(loans.loan_amount) * SUM(payments.payment_amount)>,
 <Feature: credit_score * SUM(loans.rate)>,
 <Feature: log_income * SUM(loans.rate)>,
 <Feature: credit_score * SUM(payments.payment_amount)>,
 <Feature: SUM(payments.payment_amount) * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: join_month * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: SUM(loans.loan_amount) * SUM(loans.rate)>,
 <Feature: join_month * SUM(payments.payment_amount)>

我期望这样的事情:

SUM(loans.loan_amount * loans.WEIGHT_TIME_UNTIL(loan_start))>

1 个答案:

答案 0 :(得分:1)

这里的问题是SUM(loans.loan_amount * loans.WEIGHT_TIME_UNTIL(loan_start))>是深度3功能,因为您要堆叠SumMultiplyNumericWeightTimeUntil。您可以在here文档中详细了解深度。

您可以通过增加对dfs的调用的允许深度来解决此问题

features, feature_names = ft.dfs(entityset = es, target_entity = 'clients', 
                                 agg_primitives = ['sum'],
                                 max_depth=3,
                                 trans_primitives = [WeightTimeUntil, MultiplyNumeric]) 

另一种实现方法是将特征提供为种子特征,该特征不计入最大深度。您可以这样

seed_features=[ft.Feature(es["loans"]["loan_start"], primitive=WeightTimeUntil)]

features, feature_names = ft.dfs(entityset = es, target_entity = 'clients', 
                                 agg_primitives = ['sum'],
                                 seed_features=seed_features,
                                 trans_primitives = [MultiplyNumeric])

第二种方法比较可取,因为它将创建所需的功能,但总体上功能较少。