阅读文档,向上调整max_depth
会导致复杂的“堆叠”功能。
我发现将max_depth
调整为2后产生的功能没有区别。
我在做什么错了?
max_depth
= 1:原始功能feature_matrix, features = ft.dfs(entityset=es,
target_entity='fish',
max_depth=1)
features
>>>[<Feature: sex>,
<Feature: length>,
<Feature: diameter>,
<Feature: height>,
<Feature: whole_weight>,
<Feature: shucked_weight>,
<Feature: viscera_weight>,
<Feature: shell_weight>]
max_depth
= 2:基本基元feature_matrix, features = ft.dfs(entityset=es,
target_entity='fish',
max_depth=2)
features
>>>[<Feature: sex>,
<Feature: length>,
<Feature: diameter>,
<Feature: height>,
<Feature: whole_weight>,
<Feature: shucked_weight>,
<Feature: viscera_weight>,
<Feature: shell_weight>,
<Feature: sex_adult.SUM(fish.shell_weight)>,
<Feature: sex_adult.SUM(fish.viscera_weight)>,
<Feature: sex_adult.SUM(fish.shucked_weight)>,
<Feature: sex_adult.SUM(fish.length)>,
<Feature: sex_adult.SUM(fish.diameter)>,
<Feature: sex_adult.SUM(fish.whole_weight)>,
<Feature: sex_adult.SUM(fish.height)>,
<Feature: sex_adult.STD(fish.shell_weight)>,
<Feature: sex_adult.STD(fish.viscera_weight)>,
<Feature: sex_adult.STD(fish.shucked_weight)>,
<Feature: sex_adult.STD(fish.length)>,
<Feature: sex_adult.STD(fish.diameter)>,
<Feature: sex_adult.STD(fish.whole_weight)>,
<Feature: sex_adult.STD(fish.height)>,
<Feature: sex_adult.MAX(fish.shell_weight)>,
<Feature: sex_adult.MAX(fish.viscera_weight)>,
<Feature: sex_adult.MAX(fish.shucked_weight)>,
<Feature: sex_adult.MAX(fish.length)>,
<Feature: sex_adult.MAX(fish.diameter)>,
<Feature: sex_adult.MAX(fish.whole_weight)>,
<Feature: sex_adult.MAX(fish.height)>,
<Feature: sex_adult.SKEW(fish.shell_weight)>,
<Feature: sex_adult.SKEW(fish.viscera_weight)>,
<Feature: sex_adult.SKEW(fish.shucked_weight)>,
<Feature: sex_adult.SKEW(fish.length)>,
<Feature: sex_adult.SKEW(fish.diameter)>,
<Feature: sex_adult.SKEW(fish.whole_weight)>,
<Feature: sex_adult.SKEW(fish.height)>,
<Feature: sex_adult.MIN(fish.shell_weight)>,
<Feature: sex_adult.MIN(fish.viscera_weight)>,
<Feature: sex_adult.MIN(fish.shucked_weight)>,
<Feature: sex_adult.MIN(fish.length)>,
<Feature: sex_adult.MIN(fish.diameter)>,
<Feature: sex_adult.MIN(fish.whole_weight)>,
<Feature: sex_adult.MIN(fish.height)>,
<Feature: sex_adult.MEAN(fish.shell_weight)>,
<Feature: sex_adult.MEAN(fish.viscera_weight)>,
<Feature: sex_adult.MEAN(fish.shucked_weight)>,
<Feature: sex_adult.MEAN(fish.length)>,
<Feature: sex_adult.MEAN(fish.diameter)>,
<Feature: sex_adult.MEAN(fish.whole_weight)>,
<Feature: sex_adult.MEAN(fish.height)>,
<Feature: sex_adult.COUNT(fish)>]
max_depth
= 3:与max_depth = 2相同的功能feature_matrix, features = ft.dfs(entityset=es,
target_entity='fish',
max_depth=3)
features
>>>[<Feature: sex>,
<Feature: length>,
<Feature: diameter>,
<Feature: height>,
<Feature: whole_weight>,
<Feature: shucked_weight>,
<Feature: viscera_weight>,
<Feature: shell_weight>,
<Feature: sex_adult.SUM(fish.shell_weight)>,
<Feature: sex_adult.SUM(fish.viscera_weight)>,
<Feature: sex_adult.SUM(fish.shucked_weight)>,
<Feature: sex_adult.SUM(fish.length)>,
<Feature: sex_adult.SUM(fish.diameter)>,
<Feature: sex_adult.SUM(fish.whole_weight)>,
<Feature: sex_adult.SUM(fish.height)>,
<Feature: sex_adult.STD(fish.shell_weight)>,
<Feature: sex_adult.STD(fish.viscera_weight)>,
<Feature: sex_adult.STD(fish.shucked_weight)>,
<Feature: sex_adult.STD(fish.length)>,
<Feature: sex_adult.STD(fish.diameter)>,
<Feature: sex_adult.STD(fish.whole_weight)>,
<Feature: sex_adult.STD(fish.height)>,
<Feature: sex_adult.MAX(fish.shell_weight)>,
<Feature: sex_adult.MAX(fish.viscera_weight)>,
<Feature: sex_adult.MAX(fish.shucked_weight)>,
<Feature: sex_adult.MAX(fish.length)>,
<Feature: sex_adult.MAX(fish.diameter)>,
<Feature: sex_adult.MAX(fish.whole_weight)>,
<Feature: sex_adult.MAX(fish.height)>,
<Feature: sex_adult.SKEW(fish.shell_weight)>,
<Feature: sex_adult.SKEW(fish.viscera_weight)>,
<Feature: sex_adult.SKEW(fish.shucked_weight)>,
<Feature: sex_adult.SKEW(fish.length)>,
<Feature: sex_adult.SKEW(fish.diameter)>,
<Feature: sex_adult.SKEW(fish.whole_weight)>,
<Feature: sex_adult.SKEW(fish.height)>,
<Feature: sex_adult.MIN(fish.shell_weight)>,
<Feature: sex_adult.MIN(fish.viscera_weight)>,
<Feature: sex_adult.MIN(fish.shucked_weight)>,
<Feature: sex_adult.MIN(fish.length)>,
<Feature: sex_adult.MIN(fish.diameter)>,
<Feature: sex_adult.MIN(fish.whole_weight)>,
<Feature: sex_adult.MIN(fish.height)>,
<Feature: sex_adult.MEAN(fish.shell_weight)>,
<Feature: sex_adult.MEAN(fish.viscera_weight)>,
<Feature: sex_adult.MEAN(fish.shucked_weight)>,
<Feature: sex_adult.MEAN(fish.length)>,
<Feature: sex_adult.MEAN(fish.diameter)>,
<Feature: sex_adult.MEAN(fish.whole_weight)>,
<Feature: sex_adult.MEAN(fish.height)>,
<Feature: sex_adult.COUNT(fish)>]
答案 0 :(得分:2)
为什么增加max_depth不会增加创建的要素数量?
从功能列表中脱颖而出的一件事是创建的每个新功能 是聚合类型的原语(最大值,平均值等)。没有使用转换类型原语创建新功能。
不看实体集的架构,我只能猜测,但是鱼实体上的所有变量似乎都是数字的(长度,直径,高度,重量等)或分类的(性别) 。进行了dfs
次呼叫,
feature_matrix, features = ft.dfs(entityset=es,
target_entity='fish',
max_depth=2)
不使用trans_primitives
选项,因此DFS在尝试创建新功能时将使用默认的一组变换原语。默认的一组转换原语不包含任何可应用于数字或分类变量的原语,因此没有新的转换功能。
我创建了一个模拟实体集来尝试复制这种情况:
import featuretools as ft
import numpy as np
import pandas as pd
fish = pd.DataFrame({
"sex": np.random.choice(['F', 'M'], size=10),
"length": np.random.sample(size=10),
"weight": np.random.sample(size=10)
})
es = ft.EntitySet("fish")
es.entity_from_dataframe(entity_id="fish",
make_index=True,
index="id",
dataframe=fish)
es.normalize_entity(base_entity_id='fish',
new_entity_id='sex_adult',
index='sex')
我还只使用聚合原语创建了新功能。
ft.dfs(entityset=es,
target_entity='fish',
max_depth=2,
features_only=True)
>>>[<Feature: sex>,
<Feature: length>,
<Feature: weight>,
<Feature: sex_adult.SUM(fish.length)>,
<Feature: sex_adult.SUM(fish.weight)>,
<Feature: sex_adult.STD(fish.length)>,
<Feature: sex_adult.STD(fish.weight)>,
<Feature: sex_adult.MAX(fish.length)>,
<Feature: sex_adult.MAX(fish.weight)>,
<Feature: sex_adult.SKEW(fish.length)>,
<Feature: sex_adult.SKEW(fish.weight)>,
<Feature: sex_adult.MIN(fish.length)>,
<Feature: sex_adult.MIN(fish.weight)>,
<Feature: sex_adult.MEAN(fish.length)>,
<Feature: sex_adult.MEAN(fish.weight)>,
<Feature: sex_adult.COUNT(fish)>]
将max_depth
增加到3或更多不会创建更多功能。但是,一旦我使用trans_primitives
选项添加了Percentile
转换原语(可以将其应用于数字类型值),就会得到不同的结果。
ft.dfs(entityset=es,
target_entity='fish',
max_depth=2,
trans_primitives=[ft.primitives.Percentile],
features_only=True)
>>>[<Feature: sex>,
<Feature: length>,
<Feature: weight>,
<Feature: PERCENTILE(length)>,
<Feature: PERCENTILE(weight)>,
<Feature: sex_adult.SUM(fish.length)>,
<Feature: sex_adult.SUM(fish.weight)>,
<Feature: sex_adult.STD(fish.length)>,
<Feature: sex_adult.STD(fish.weight)>,
<Feature: sex_adult.MAX(fish.length)>,
<Feature: sex_adult.MAX(fish.weight)>,
<Feature: sex_adult.SKEW(fish.length)>,
<Feature: sex_adult.SKEW(fish.weight)>,
<Feature: sex_adult.MIN(fish.length)>,
<Feature: sex_adult.MIN(fish.weight)>,
<Feature: sex_adult.MEAN(fish.length)>,
<Feature: sex_adult.MEAN(fish.weight)>,
<Feature: sex_adult.COUNT(fish)>]
两个新功能,Percentile(length)
和Percentile(weight)
。将max_depth
增加到3可增加更多功能。
ft.dfs(entityset=es,
target_entity='fish',
max_depth=3,
trans_primitives=[ft.primitives.Percentile],
features_only=True)
>[<Feature: sex>,
<Feature: length>,
<Feature: weight>,
<Feature: PERCENTILE(length)>,
<Feature: PERCENTILE(weight)>,
<Feature: sex_adult.SUM(fish.length)>,
<Feature: sex_adult.SUM(fish.weight)>,
<Feature: sex_adult.STD(fish.length)>,
<Feature: sex_adult.STD(fish.weight)>,
<Feature: sex_adult.MAX(fish.length)>,
<Feature: sex_adult.MAX(fish.weight)>,
<Feature: sex_adult.SKEW(fish.length)>,
<Feature: sex_adult.SKEW(fish.weight)>,
<Feature: sex_adult.MIN(fish.length)>,
<Feature: sex_adult.MIN(fish.weight)>,
<Feature: sex_adult.MEAN(fish.length)>,
<Feature: sex_adult.MEAN(fish.weight)>,
<Feature: sex_adult.COUNT(fish)>,
<Feature: sex_adult.SUM(fish.PERCENTILE(length))>,
<Feature: sex_adult.SUM(fish.PERCENTILE(weight))>,
<Feature: sex_adult.STD(fish.PERCENTILE(length))>,
<Feature: sex_adult.STD(fish.PERCENTILE(weight))>,
<Feature: sex_adult.MAX(fish.PERCENTILE(length))>,
<Feature: sex_adult.MAX(fish.PERCENTILE(weight))>,
<Feature: sex_adult.SKEW(fish.PERCENTILE(length))>,
<Feature: sex_adult.SKEW(fish.PERCENTILE(weight))>,
<Feature: sex_adult.MIN(fish.PERCENTILE(length))>,
<Feature: sex_adult.MIN(fish.PERCENTILE(weight))>,
<Feature: sex_adult.MEAN(fish.PERCENTILE(length))>,
<Feature: sex_adult.MEAN(fish.PERCENTILE(weight))>,
<Feature: sex_adult.PERCENTILE(MAX(fish.length))>,
<Feature: sex_adult.PERCENTILE(SUM(fish.length))>,
<Feature: sex_adult.PERCENTILE(MAX(fish.weight))>,
<Feature: sex_adult.PERCENTILE(SKEW(fish.length))>,
<Feature: sex_adult.PERCENTILE(MIN(fish.length))>,
<Feature: sex_adult.PERCENTILE(MIN(fish.weight))>,
<Feature: sex_adult.PERCENTILE(MEAN(fish.weight))>,
<Feature: sex_adult.PERCENTILE(STD(fish.weight))>,
<Feature: sex_adult.PERCENTILE(COUNT(fish))>,
<Feature: sex_adult.PERCENTILE(STD(fish.length))>,
<Feature: sex_adult.PERCENTILE(SUM(fish.weight))>,
<Feature: sex_adult.PERCENTILE(SKEW(fish.weight))>,
<Feature: sex_adult.PERCENTILE(MEAN(fish.length))>]>
但是,将max_depth
增加到4以上不会创建更多的附加功能。 DFS遵循的规则没有创建更多功能。但是通常,通过添加更多的原语,实体和数据类型,可以有更多的组合,这些组合可以导致更多的这些“堆叠”功能。