我有正在准备的交易数据集
val df = spark.read.parquet("..").select("bill","plu_id").distinct()
| bill| plu_id|
+-----+-------+
| 1|3447284|
| 1|3255517|
| 1| 3757|
| 1|3501662|
| 1| 21676|
| 2|3499538|
| 2|3248365|
| 2|3453599|
| 2|3602083|
| 2| 18898|
| 3|3446809|
+-----+-------+
以前我用过PySpark
from pyspark.mllib.fpm import FPGrowth
from pyspark.sql import functions as F
df = spark.read.parquet('..').select('bill','plu_id').distinct().cache()
datardd=df.rdd.map(lambda x: (x[0],x[1])).groupByKey().mapValues(list).values()
model = FPGrowth.train(datardd, minSupport=0.0001, numPartitions=1024)
请帮助我,如何为 scala 上的关联规则模型准备数据集? :)