Question

fpGrowth = FPGrowth(itemsCol="items", minSupport=0.5, minConfidence=0.6)
model = fpGrowth.fit(df)
model.associationRules.show()

使用上面的代码，我只能获得每个关联规则的信心。但是如何在Pyspark中使用Spark FP-growth获得每个关联规则的＆＃39; 值？

在这种情况下，我只有这两个数据帧，如何在第一个数据框中自动添加置信度值后面的提升值（不是手动添加？

+----------+----------+------------------+
|antecedent|consequent|        confidence|
+----------+----------+------------------+
|    [2, 1]|       [5]|0.6666666666666666|
|    [5, 1]|       [2]|               1.0|
|       [2]|       [1]|               1.0|
|       [2]|       [5]|0.6666666666666666|
|       [5]|       [2]|               1.0|
|       [5]|       [1]|               1.0|
|    [5, 2]|       [1]|               1.0|
|       [1]|       [2]|               1.0|
|       [1]|       [5]|0.6666666666666666|
+----------+----------+------------------+

+---------+----+------------------+
|    items|freq|           support|
+---------+----+------------------+
|      [1]|   3|               1.0|
|      [2]|   3|               1.0|
|   [2, 1]|   3|               1.0|
|      [5]|   2|0.6666666666666666|
|   [5, 2]|   2|0.6666666666666666|
|[5, 2, 1]|   2|0.6666666666666666|
|   [5, 1]|   2|0.6666666666666666|
+---------+----+------------------+

Answer 1

很容易计算：提升是置信度C。因此规则的提升就是置信度C(a->b)/C(b)。例如，如果bread -> cheese的置信度为1.2且-> cheese的置信度为1.1，则升力为1.2/1.1。

请参阅here

Answer 2

df1.join(df2,df2.items==df1.consequent, 'left').select("antecedent","consequent","confidence","support").show()

+----------+----------+------------------+------------------+
|antecedent|consequent|        confidence|           support|
+----------+----------+------------------+------------------+
|    [2, 1]|       [5]|0.6666666666666666|0.6666666666666666|
|       [2]|       [5]|0.6666666666666666|0.6666666666666666|
|       [1]|       [5]|0.6666666666666666|0.6666666666666666|
|       [2]|       [1]|               1.0|               1.0|
|       [5]|       [1]|               1.0|               1.0|
|    [5, 2]|       [1]|               1.0|               1.0|
|    [5, 1]|       [2]|               1.0|               1.0|
|       [5]|       [2]|               1.0|               1.0|
|       [1]|       [2]|               1.0|               1.0|
+----------+----------+------------------+------------------+

如何在Pyspark中使用Spark FP-growth获得关联规则的提升值？

2 个答案: