我想将数据框的一列保持其原始状态,而不对其应用任何原语,这可能吗?
答案 0 :(得分:0)
是的,您可以使用ignore_variables
的{{1}}参数来完成此操作。这是一个演示实体集的示例。
ft.dfs
如果我们要为会话实体构建功能,而忽略import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)
es.plot()
变量,则可以运行
device
feature_defs = ft.dfs(target_entity="sessions",
entityset=es,
agg_primitives=["count", "mode"],
trans_primitives=[],
ignore_variables={"sessions": ["device"]},
features_only=True)
具有以下功能
feature_defs
这使用[<Feature: customer_id>,
<Feature: COUNT(transactions)>,
<Feature: MODE(transactions.product_id)>,
<Feature: customers.zip_code>,
<Feature: MODE(transactions.products.brand)>,
<Feature: customers.COUNT(sessions)>,
<Feature: customers.COUNT(transactions)>,
<Feature: customers.MODE(transactions.product_id)>]
和count
原语创建功能,但忽略了会话实体中的设备变量。如果我们想将设备变量包含在其原始状态,则可以像这样
mode
现在,我们可以计算特征矩阵了。 feature_defs += [ft.Feature(es["sessions"]["device"])]
现在结束了
device
作为健全性检查,如果我们不使用fm = ft.calculate_feature_matrix(features=feature_defs, entityset=es)
fm
customer_id COUNT(transactions) MODE(transactions.product_id) customers.zip_code ... customers.COUNT(sessions) customers.COUNT(transactions) customers.MODE(transactions.product_id) device
session_id ...
1 2 16 3 13244 ... 7 93 4 desktop
2 5 10 5 60091 ... 6 79 5 mobile
3 4 15 1 60091 ... 8 109 2 mobile
4 1 25 5 60091 ... 8 126 4 mobile
5 4 11 5 60091 ... 8 109 2 mobile
6 1 15 4 60091 ... 8 126 4 tablet
7 3 15 1 13244 ... 6 93 1 tablet
8 4 18 1 60091 ... 8 109 2 tablet
9 1 15 1 60091 ... 8 126 4 desktop
10 2 15 2 13244 ... 7 93 4 tablet
11 4 15 3 60091 ... 8 109 2 mobile
12 4 10 4 60091 ... 8 109 2 desktop
13 4 12 2 60091 ... 8 109 2 mobile
14 1 12 4 60091 ... 8 126 4 tablet
15 2 8 2 13244 ... 7 93 4 desktop
16 2 10 4 13244 ... 7 93 4 desktop
17 2 13 1 13244 ... 7 93 4 tablet
18 1 12 2 60091 ... 8 126 4 desktop
19 3 17 1 13244 ... 6 93 1 desktop
20 5 15 1 60091 ... 6 79 5 desktop
21 4 18 5 60091 ... 8 109 2 desktop
22 4 10 2 60091 ... 8 109 2 desktop
23 3 11 3 13244 ... 6 93 1 desktop
24 5 14 4 60091 ... 6 79 5 tablet
25 3 16 1 13244 ... 6 93 1 desktop
26 1 16 1 60091 ... 8 126 4 tablet
27 1 15 5 60091 ... 8 126 4 mobile
28 5 18 2 60091 ... 6 79 5 mobile
29 1 16 4 60091 ... 8 126 4 mobile
30 5 14 3 60091 ... 6 79 5 desktop
31 2 18 3 13244 ... 7 93 4 mobile
32 5 8 3 60091 ... 6 79 5 mobile
33 2 13 3 13244 ... 7 93 4 mobile
34 3 18 4 13244 ... 6 93 1 desktop
35 3 16 5 13244 ... 6 93 1 mobile
ignore_variables
您可以看到feature_defs = ft.dfs(target_entity="sessions",
entityset=es,
agg_primitives=["count", "mode"],
trans_primitives=[],
features_only=True)
功能已创建
<Feature: customers.MODE(sessions.device)>