如何获得项目的分组均值但如何排除项目本身?

时间:2019-07-30 22:35:22

标签: featuretools

如何使用特征工具获取该项目所属组的平均值,但不包括该项目本身? 例如,

输入:

item     group    value1

I1        C1        1

I2        C2        5

I3        C2        3

I4        C2        8

I5        C1        4

I6        C1        5

I7        C1        6

I8        C2        4

I9        C3        2

I10       C3        3

预期输出:

item     mean_value1_peergroup

I1        5 #mean([4,5,6]) rather than mean([1, 4, 5, 6])

I2        5 #mean(3,8,4)

...

I10       2 #mean([2])

1 个答案:

答案 0 :(得分:0)

这可以通过自定义转换原语来完成。您可以像这样定义原始

import pandas as pd
import featuretools as ft
from featuretools.primitives import TransformPrimitive
from featuretools.variable_types import Numeric

class MeanExcludingValue(TransformPrimitive):
    name = "mean_excluding_value"
    input_types = [Numeric]
    return_type = Numeric
    stack_on_self = False

    def get_function(self):
        def mean_excluding_value(s):
            """calculate the mean of the group excluding the current element"""
            return (s.sum() - s) / len(s)
        return mean_excluding_value

现在,让我们像示例一样创建数据样本,并将其加载到实体集中。

df = pd.DataFrame({
    "item": ["I1", "I2", "I3", "I4", "I5", "I6", "I7", "I8", "I9", "I10"],
    "group": ["C1", "C1", "C1", "C1", "C1", "C1", "C1", "C2", "C2", "C3"],
    "value1": [1, 5, 3, 8, 4, 5, 6, 4, 2, 3]
})

es = ft.EntitySet()
es.entity_from_dataframe(entity_id="example",
                         dataframe=df,
                         index="item",
                         variable_types={
                             "group": ft.variable_types.Id # this is important for grouping later
                         })

最后,我们用新的原语调用dfs。

fm, fl = ft.dfs(target_entity="example",
                entityset=es,
                trans_primitives=[MeanExcludingValue],
                groupby_trans_primitives=[MeanExcludingValue],
                max_depth=1)

fm

这将返回

      value1 group  MEAN_EXCLUDING_VALUE(value1)  MEAN_EXCLUDING_VALUE(value1) by group
item                                                                                   
I1         1    C1                           4.0                               4.428571
I10        3    C3                           3.8                               0.000000
I2         5    C1                           3.6                               3.857143
I3         3    C1                           3.8                               4.142857
I4         8    C1                           3.3                               3.428571
I5         4    C1                           3.7                               4.000000
I6         5    C1                           3.6                               3.857143
I7         6    C1                           3.5                               3.714286
I8         4    C2                           3.7                               1.000000
I9         2    C2                           3.9                               2.000000

您可以详细了解trans_primitivesgroupby_trans_primitives here之间的区别。