内置的聚合和转换原语列表

时间:2019-06-12 17:46:57

标签: featuretools

首先,我喜欢功能工具。它使我的工作变得更加轻松和高效。一个简单的问题:我只是在寻找非定制的agg&trans原语的完整列表,但似乎找不到。我是否只是采用API中的方法列表,并用小写字母(和下划线)代替大写字母?

1 个答案:

答案 0 :(得分:1)

如果运行featuretools.list_primitives(),它将返回包含所有原语名称的数据框。可以将“名称”列中的字符串提供给ft.dfs

>>> import featuretools as ft   
>>> ft.list_primitives()
                               name         type                                        description
0                      percent_true  aggregation           Determines the percent of `True` values.
1                              last  aggregation               Determines the last value in a list.
2                          num_true  aggregation                Counts the number of `True` values.
3                               std  aggregation  Computes the dispersion relative to the mean v...
4                        num_unique  aggregation  Determines the number of distinct values, igno...
5                               sum  aggregation     Calculates the total addition, ignoring `NaN`.
6                              skew  aggregation  Computes the extent to which a distribution di...
7                              mode  aggregation       Determines the most commonly repeated value.
8                  time_since_first  aggregation  Calculates the time elapsed since the first da...
9                               max  aggregation  Calculates the highest value, ignoring `NaN` v...
10                           median  aggregation  Determines the middlemost number in a list of ...
11                             mean  aggregation         Computes the average for a list of values.
12                  time_since_last  aggregation  Calculates the time elapsed since the last dat...

此外,您还可以直接导入并传递原始类。例如,这两个调用是等效的。

>>> from featuretools.primitives import Max, TimeSincePrevious
>>> ft.dfs(agg_primtives=[Max, TimeSincePrevious], ...)
>>> ft.dfs(agg_primtives=["max", "time_since_previous"], ...)

如果需要修改可控制的参数,则导入原始对象可能会有所帮助。例如,要使TimeSincePrevious以小时为单位返回(默认值是秒)

>>> ft.dfs(agg_primtives=[Max, TimeSincePrevious(unit="hours")], ...)