Question

我正在使用功能工具创建每月汇总。

我有包含贷款申请的玩具数据（1000 ID_APPLICATION； 1000 TIME_APPLICATION） 20万笔交易（1人约200笔交易； 1笔交易具有AMOUNT，TIME等信息，此示例不需要）。 TIME列包含上一年或更长时间的一个人大约200次不同的时间。

constants.py
____________
ID_APPLICATION_COLUMN = "ID_APPLICATION"
ID_TRANSACTIONS_COLUMN = "ID_TRANSACTION"
TIME_COLUMN = "TIME"
TIME_APPLICATION_COLUMN = "TIME_APPLICATION"
ENTITY_SET_NAME = "clients"
TRANSACTIONS_ENTITY_NAME = "transactions"
APPLICATIONS_ENTITY_NAME = "applications"

creation
____________
# we fill the entity_set with the dataframes, and say, which IDs are relevant for given DF
entity_set.entity_from_dataframe(entity_id=cnst.TRANSACTIONS_ENTITY_NAME,
                                    dataframe=transactions,
                                    index=cnst.ID_TRANSACTIONS_COLUMN,
                                    time_index=cnst.TIME_COLUMN)
entity_set.entity_from_dataframe(entity_id=cnst.APPLICATIONS_ENTITY_NAME,
                                    dataframe=applications,
                                    index=cnst.ID_APPLICATION_COLUMN,
                                    time_index=cnst.TIME_APPLICATION_COLUMN)

# Specification of the relationship between entities
r_transactions_applications = ft.Relationship(
    parent_variable=entity_set[cnst.APPLICATIONS_ENTITY_NAME][cnst.ID_APPLICATION_COLUMN],
    child_variable=entity_set[cnst.TRANSACTIONS_ENTITY_NAME][cnst.ID_APPLICATION_COLUMN])
entity_set.add_relationship(r_transactions_applications)

但是，我对颞叶切除有疑问。

当我创建它们并应用它们时：

default_agg_primitives =  ["count", "sum", "std", "max", "mode", "mean"]
default_trans_primitives =  ['month', 'day', 'time_since_previous']
temporal_cutoffs = ft.make_temporal_cutoffs(
    instance_ids=applications[cnst.ID_APPLICATION_COLUMN],
    cutoffs=applications[cnst.TIME_APPLICATION_COLUMN],
    window_size='1m',
    num_windows=6)
transformed_data = ft.dfs(entityset=entity_set,
                          target_entity=cnst.APPLICATIONS_ENTITY_NAME,
                          cutoff_time=temporal_cutoffs,
                          cutoff_time_in_index=True,
                          trans_primitives=default_trans_primitives,
                          agg_primitives=default_agg_primitives,
                          max_depth=2)

当我为应用程序级别进行汇总时，我得到了1000个没有时间限制的行。当我应用它们时，我得到的是6000行，但是5000行（除最后一个以外的所有其他月份）都是0或NaN，其余的与我根本不会使用时间截止的情况相同。 / p>

在我看来，TIME列未注册，数据集也未拆分。

我在哪里可以设置？

功能工具-时间截止不会注册时间索引变量

0 个答案: