以下定义了EntitySet。我在事务表did
上将Index
声明为tx
,但它注册为Id
,而不是Index
。为什么会这样?
目标是删除以下警告。
在什么情况下,Index
的分配将被覆盖为Id
(主键还是外部键?),并且did
注册为Id
的事实与警告?
一个uid
在did
表中可以有多个tx
。
es = ft.EntitySet(id="the_entity_set")
# hse
es = es.entity_from_dataframe(entity_id="hse",
dataframe=hse,
index="uid",
variable_types={"Gender": ft.variable_types.Categorical,
"Income": ft.variable_types.Numeric,
"dob" : ft.variable_types.Datetime})
# types
es = es.entity_from_dataframe(entity_id="types",
dataframe=types,
index="type_id",
variable_types={"type": ft.variable_types.Categorical})
# files
es = es.entity_from_dataframe(entity_id="files",
dataframe=files,
index="file_id",
variable_types={"file": ft.variable_types.Categorical})
# uid_donations
es = es.entity_from_dataframe(entity_id="uid_txlup",
dataframe=uid_txlup,
index="did",
variable_types={"uid": ft.variable_types.Categorical})
# transactions
es = es.entity_from_dataframe(entity_id="tx",
dataframe=tx,
index="did",
time_index="dt",
variable_types={"file_id": ft.variable_types.Categorical,
"type_id": ft.variable_types.Categorical,
"amt": ft.variable_types.Numeric})
rels = [
ft.Relationship(es["files"]["file_id"],es["tx"]["file_id"]),
ft.Relationship(es["types"]["type_id"],es["tx"]["type_id"]),
ft.Relationship(es["hse"]["uid"], es["uid_txlup"]["uid"]),
ft.Relationship(es["uid_txlup"]["did"],es["tx"]["did"])
]
es.add_relationships( rels )
这是EntitySet的样子
Entityset: the_entity_set
Entities:
hse [Rows: 100, Columns: 4]
types [Rows: 8, Columns: 2]
files [Rows: 2, Columns: 2]
uid_txlup [Rows: 336, Columns: 2]
tx [Rows: 336, Columns: 5]
Relationships:
tx.file_id -> files.file_id
tx.type_id -> types.type_id
uid_txlup.uid -> hse.uid
tx.did -> uid_txlup.did
es.entities
[Entity: hse
Variables:
uid (dtype: index)
Gender (dtype: categorical)
Income (dtype: numeric)
dob (dtype: datetime)
Shape:
(Rows: 100, Columns: 4), Entity: types
Variables:
type_id (dtype: index)
type (dtype: categorical)
Shape:
(Rows: 8, Columns: 2), Entity: files
Variables:
file_id (dtype: index)
file (dtype: categorical)
Shape:
(Rows: 2, Columns: 2), Entity: uid_txlup
Variables:
did (dtype: index)
uid (dtype: categorical)
Shape:
(Rows: 336, Columns: 2), Entity: tx
Variables:
did (dtype: id) ### <<< external key ???
dt (dtype: datetime)
file_id (dtype: categorical)
type_id (dtype: categorical)
amt (dtype: numeric)
Shape:
(Rows: 336, Columns: 5)]
为什么当我打电话给did
时Id
显示为Index
而不显示为fts
?
这是警告:
feature_matrix, feature_defs = ft.dfs(entityset=es,
target_entity="hse",
agg_primitives=["sum","mode","percent_true"],
where_primitives=["count", "avg_time_between"],
max_depth=2)
feature_defs
.../anaconda3/lib/python3.6/site-packages/featuretools-0.2.1-py3.6.egg/featuretools/entityset/entityset.py:432: FutureWarning: 'did' is both an index level and a column label.
Defaulting to column, but this will raise an ambiguity error in a future version
end_entity_id=child_eid)
答案 0 :(得分:2)
您的实体集中的关系将始终在父实体的Id
变量和子实体的Index
变量之间。因此,添加关系时,无论您指定什么内容,featuretools都会自动将变量从子实体转换为Index
类型。
如果实体之间存在一对一的关系,则变量可能既是Index
又是Id
。在这种情况下,您应将两个实体合并为一个。