如何使用特征工具获取该项目所属组的平均值,但不包括该项目本身? 例如,
输入:
item group value1
I1 C1 1
I2 C2 5
I3 C2 3
I4 C2 8
I5 C1 4
I6 C1 5
I7 C1 6
I8 C2 4
I9 C3 2
I10 C3 3
预期输出:
item mean_value1_peergroup
I1 5 #mean([4,5,6]) rather than mean([1, 4, 5, 6])
I2 5 #mean(3,8,4)
...
I10 2 #mean([2])
答案 0 :(得分:0)
这可以通过自定义转换原语来完成。您可以像这样定义原始
import pandas as pd
import featuretools as ft
from featuretools.primitives import TransformPrimitive
from featuretools.variable_types import Numeric
class MeanExcludingValue(TransformPrimitive):
name = "mean_excluding_value"
input_types = [Numeric]
return_type = Numeric
stack_on_self = False
def get_function(self):
def mean_excluding_value(s):
"""calculate the mean of the group excluding the current element"""
return (s.sum() - s) / len(s)
return mean_excluding_value
现在,让我们像示例一样创建数据样本,并将其加载到实体集中。
df = pd.DataFrame({
"item": ["I1", "I2", "I3", "I4", "I5", "I6", "I7", "I8", "I9", "I10"],
"group": ["C1", "C1", "C1", "C1", "C1", "C1", "C1", "C2", "C2", "C3"],
"value1": [1, 5, 3, 8, 4, 5, 6, 4, 2, 3]
})
es = ft.EntitySet()
es.entity_from_dataframe(entity_id="example",
dataframe=df,
index="item",
variable_types={
"group": ft.variable_types.Id # this is important for grouping later
})
最后,我们用新的原语调用dfs。
fm, fl = ft.dfs(target_entity="example",
entityset=es,
trans_primitives=[MeanExcludingValue],
groupby_trans_primitives=[MeanExcludingValue],
max_depth=1)
fm
这将返回
value1 group MEAN_EXCLUDING_VALUE(value1) MEAN_EXCLUDING_VALUE(value1) by group
item
I1 1 C1 4.0 4.428571
I10 3 C3 3.8 0.000000
I2 5 C1 3.6 3.857143
I3 3 C1 3.8 4.142857
I4 8 C1 3.3 3.428571
I5 4 C1 3.7 4.000000
I6 5 C1 3.6 3.857143
I7 6 C1 3.5 3.714286
I8 4 C2 3.7 1.000000
I9 2 C2 3.9 2.000000
您可以详细了解trans_primitives
和groupby_trans_primitives
here之间的区别。