如何将FPGrowth项目集限制为2或3

时间:2019-06-30 04:42:38

标签: pyspark python-3.6

我正在使用jupyter笔记本在python3.6中使用pyspark运行FPGrowth算法。当我尝试保存关联规则时,生成的规则输出非常巨大。所以我想限制结果的数量。这是我尝试过的代码。我还更改了spark上下文参数。

Maximum Pattern Length fpGrowth (Apache) PySpark

from pyspark.sql.functions import col, size
from pyspark.ml.fpm import FPGrowth
from pyspark.sql import Row
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
from pyspark import SparkConf

conf = SparkConf().setAppName("App")
conf = (conf.setMaster('local[*]')
        .set('spark.executor.memory', '100G')
        .set('spark.driver.memory', '400G')
        .set('spark.driver.maxResultSize', '200G'))
sc = SparkContext.getOrCreate(conf=conf)
spark = SparkSession(sc)
R = Row('ID', 'items')
df=spark.createDataFrame([R(i, x) for i, x in enumerate(lol)])
fpGrowth = FPGrowth(itemsCol="items", minSupport=0.7, minConfidence=0.9)

model = fpGrowth.fit(df)
ar=model.associationRules.where(size(col('antecedent')) == 2).where(size(col('cosequent')) == 1)

ar.cache()
ar.toPandas().to_csv('output.csv')
     It gives an error


   TypeError Traceback (most recent call last)
   <ipython-input-1-f90c7a9f11ae> in <module>

   ---> 73 ar=model.associationRules.where(size(col('antecedent')) == 
  2).where(size(col('consequent')) == 1)
   TypeError: 'str' object is not callable

有人可以帮我解决问题吗?

以下是交易清单:[['a','b'],['c','a','e'] ....]

Python:3.6.5 皮斯帕克 Windows 10

1 个答案:

答案 0 :(得分:0)

通过上面的讨论和该链接,它帮助我解决了问题。

'str' object is not callable TypeError

   import pyspark.sql.functions as func
   model.associationRules.where(func.size(func.col('antecedent')) == 1).where(func.size(func.col('consequent')) == 1).show()