我的环境是:
Operating system version.... Windows-10-10.0.17134-SP0
Python version is........... 3.6.5
pandas version is........... 0.23.0
numpy version is............ 1.14.3
Featuretools................ 0.3.0
我的熊猫数据框如下:
df
index BoxRatio Thrust Velocity OnBalRun vwapGain
0 1 0.324000 0.615000 1.525000 3.618000 0.416000
1 2 0.938249 0.366377 2.402230 6.393223 2.667106
2 3 0.317000 -0.281000 0.979000 1.489000 0.506000
3 4 0.289000 -0.433000 0.796000 2.081000 0.536000
4 5 1.551115 -0.103734 0.731682 1.752156 0.667016
我尝试了以下方法:
es = ft.EntitySet('Pattern')
es.entity_from_dataframe(dataframe=df,
entity_id='my_id',
index='index')
def log10(column):
return np.log10(column)
Log10 = make_trans_primitive(function=log10,
input_types=[Numeric],
return_type=Numeric)
from featuretools.primitives import (Count, Sum, Mean, Median, Std, Min, Max, Multiply)
feature_matrix, feature_names = ft.dfs(entityset=es,
target_entity='my_id',
trans_primitives=[Log10])
print('feature_names:\n')
for item in feature_names:
print(' ' + item)
其中给出以下内容:
feature_names:
<Feature: + BoxRatio>
<Feature: + Thrust>
<Feature: + Velocity>
<Feature: + OnBalRun>
<Feature: + vwapGain>
<Feature: + LOG10(BoxRatio)>
<Feature: + LOG10(Thrust)>
<Feature: + LOG10(Velocity)>
<Feature: + LOG10(OnBalRun)>
<Feature: + LOG10(vwapGain)>
到目前为止很好...现在,如果我添加“ Min”原语,我将得到:
Traceback (most recent call last):
File "H:\ML\BlogExperiments\Python\SKLearn\FeaturetoolsTest\FeaturetoolsTest\FeaturetoolsTest.py", line 112, in <module>
Main()
File "H:\ML\BlogExperiments\Python\SKLearn\FeaturetoolsTest\FeaturetoolsTest\FeaturetoolsTest.py", line 95, in Main
trans_primitives=[Log10, Min])
File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\dfs.py", line 184, in dfs
features = dfs_object.build_features(verbose=verbose)
File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 218, in build_features
all_features, max_depth=self.max_depth)
File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 365, in _run_dfs
all_features, entity, max_depth=max_depth)
File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 514, in _build_transform_features
new_f = trans_prim(*matching_input)
TypeError: new_class_init() missing 1 required positional argument: 'parent_entity'
我希望看到每个列功能的最小值(就像Log10原语一样)。当然,我可以定义自己的Min原语,但我希望有一个简单的解决方案。
查尔斯
答案 0 :(得分:0)
这里的问题是Min是一个聚合原语,而Log是一个转换原语。
聚合原语将相关实例作为输入并输出单个值。它们适用于实体集中的父子关系。例如,Min接受一个值列表,然后返回列表的最小值。
转换原语将实体中的一个或多个变量作为输入,并输出该实体的新变量。它们应用于单个实体。例如,log接受值列表,并返回与输入中每个项目的日志长度相同的列表。
您可以在文档中了解有关原语的更多信息:https://docs.featuretools.com/automated_feature_engineering/primitives.html