带有单个表和Min原语的Featuretools给出错误

时间:2018-09-01 15:10:52

标签: python-3.x featuretools

我的环境是:

Operating system version.... Windows-10-10.0.17134-SP0
Python version is........... 3.6.5
pandas version is........... 0.23.0
numpy version is............ 1.14.3
Featuretools................ 0.3.0

我的熊猫数据框如下:

df
    index  BoxRatio    Thrust  Velocity  OnBalRun  vwapGain
0      1  0.324000  0.615000  1.525000  3.618000  0.416000
1      2  0.938249  0.366377  2.402230  6.393223  2.667106
2      3  0.317000 -0.281000  0.979000  1.489000  0.506000
3      4  0.289000 -0.433000  0.796000  2.081000  0.536000
4      5  1.551115 -0.103734  0.731682  1.752156  0.667016

我尝试了以下方法:

  es = ft.EntitySet('Pattern')
  es.entity_from_dataframe(dataframe=df,
                           entity_id='my_id',
                           index='index')
  def log10(column):
    return np.log10(column)

  Log10 = make_trans_primitive(function=log10,
                               input_types=[Numeric],
                               return_type=Numeric)

  from featuretools.primitives import (Count, Sum, Mean, Median, Std, Min, Max, Multiply)

  feature_matrix, feature_names = ft.dfs(entityset=es, 
                                         target_entity='my_id',
                                         trans_primitives=[Log10])
  print('feature_names:\n')
  for item in feature_names:
    print('  ' + item)

其中给出以下内容:

feature_names:
<Feature:    + BoxRatio>
<Feature:    + Thrust>
<Feature:    + Velocity>
<Feature:    + OnBalRun>
<Feature:    + vwapGain>
<Feature:    + LOG10(BoxRatio)>
<Feature:    + LOG10(Thrust)>
<Feature:    + LOG10(Velocity)>
<Feature:    + LOG10(OnBalRun)>
<Feature:    + LOG10(vwapGain)>

到目前为止很好...现在,如果我添加“ Min”原语,我将得到:

Traceback (most recent call last):
  File "H:\ML\BlogExperiments\Python\SKLearn\FeaturetoolsTest\FeaturetoolsTest\FeaturetoolsTest.py", line 112, in <module>
    Main()
  File "H:\ML\BlogExperiments\Python\SKLearn\FeaturetoolsTest\FeaturetoolsTest\FeaturetoolsTest.py", line 95, in Main
    trans_primitives=[Log10, Min])
  File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\dfs.py", line 184, in dfs
    features = dfs_object.build_features(verbose=verbose)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 218, in build_features
    all_features, max_depth=self.max_depth)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 365, in _run_dfs
    all_features, entity, max_depth=max_depth)
  File "C:\Users\Charles\Anaconda3\lib\site-packages\featuretools\synthesis\deep_feature_synthesis.py", line 514, in _build_transform_features
    new_f = trans_prim(*matching_input)
TypeError: new_class_init() missing 1 required positional argument: 'parent_entity'

我希望看到每个列功能的最小值(就像Log10原语一样)。当然,我可以定义自己的Min原语,但我希望有一个简单的解决方案。

查尔斯

1 个答案:

答案 0 :(得分:0)

这里的问题是Min是一个聚合原语,而Log是一个转换原语。

聚合原语将相关实例作为输入并输出单个值。它们适用于实体集中的父子关系。例如,Min接受一个值列表,然后返回列表的最小值。

转换原语将实体中的一个或多个变量作为输入,并输出该实体的新变量。它们应用于单个实体。例如,log接受值列表,并返回与输入中每个项目的日志长度相同的列表。

您可以在文档中了解有关原语的更多信息:https://docs.featuretools.com/automated_feature_engineering/primitives.html