我正在使用Featuretools文档来学习实体集,并且当前遇到以下代码段的错误KeyError: 'Variable: device not found in entity'
:
import featuretools as ft
data = ft.demo.load_mock_customer()
customers_df = data["customers"]
customers_df
sessions_df = data["sessions"]
sessions_df.sample(5)
transactions_df = data["transactions"]
transactions_df.sample(10)
products_df = data["products"]
products_df
### Creating an entity set
es = ft.EntitySet(id="transactions")
### Adding entities
es = es.entity_from_dataframe(entity_id="transactions", dataframe=transactions_df, index="transaction_id", time_index="transaction_time", variable_types={"product_id": ft.variable_types.Categorical})
es
es["transactions"].variables
es = es.entity_from_dataframe(entity_id="products",dataframe=products_df,index="product_id")
es
### Adding new relationship
new_relationship = ft.Relationship(es["products"]["product_id"],
es["transactions"]["product_id"])
es = es.add_relationship(new_relationship)
es
### Creating entity from existing table
es = es.normalize_entity(base_entity_id="transactions",
new_entity_id="sessions",
index = "session_id",
additional_variables=["device",customer_id","zip_code"])
这是根据URL-https://docs.featuretools.com/loading_data/using_entitysets.html
从API es.normalise_entity看,该函数将创建索引为'session_id'的其余三个变量的新实体'sessions',但是错误为:
C:\ Users \ s_belvi \ AppData \ Local \ Continuum \ Anaconda2 \ lib \ site-packages \ featuretools \ entityset \ entity.pyc在_get_variable(self,variable_id)中 250返回v 251 -> 252提高KeyError(“变量:在实体中找不到%s”%(variable_id)) 253 254 @property
KeyError:'变量:在实体中找不到设备'
在使用es.normalize_entity之前,我们是否需要单独创建实体“会话”?看起来语法上的错误在流程中出现了一些小错误。
答案 0 :(得分:0)
此处的错误是由于device
不在您的transactions_df
中的一列中引起的。在该文档的该页面中引用的“交易”表的字典形式中的列多于demo.load_mock_customer
。您可以使用return_single_table
参数找到其余的列。这是normalize_entity
的完整示例,仅从您尝试的代码中进行了一些修改:
import featuretools as ft
data = ft.demo.load_mock_customer(return_single_table=True)
es = ft.EntitySet(id="Mock Customer")
es = es.entity_from_dataframe(entity_id="transactions",
dataframe=data,
index="transaction_id",
time_index="transaction_time",
variable_types={"product_id": ft.variable_types.Categorical})
es = es.normalize_entity(base_entity_id="transactions",
new_entity_id="sessions",
index = "session_id",
additional_variables=["device","customer_id","zip_code"])
这将返回具有两个实体和一个关系的EntitySet:
Entityset: Mock Customer
Entities:
transactions [Rows: 500, Columns: 8]
sessions [Rows: 35, Columns: 5]
Relationships:
transactions.session_id -> sessions.session_id