在Spark中过滤freqItemsets时出现意外结果

时间:2015-12-15 10:57:27

标签: scala apache-spark data-mining

我关注关联规则的this教程

val ar = new AssociationRules()
  .setMinConfidence(0.8)
val results = ar.run(freqItemsets)

我已将.filter(item => item.items.length == 1)添加到freqItemsets,但没有显示任何内容,尽管有几种item a => item b形式的规则。

1 个答案:

答案 0 :(得分:3)

我运行了该网站的示例,我没有遇到任何问题,我收到了一些物品。

import org.apache.spark.rdd.RDD
import org.apache.spark.mllib.fpm.FPGrowth

val data = sc.textFile("hdfs://master/spark-sample-data/sample_fpgrowth.txt", 16)

val transactions: RDD[Array[String]] = data.map(s => s.trim.split(' '))

val fpg = new FPGrowth().setMinSupport(0.2).setNumPartitions(16)
val model = fpg.run(transactions)

val individualItems = model.freqItemsets.filter(
  itemset => itemset.items.length == 1)

//If you print the number of items
println(individualItems.count())
// 8

individualItems.map(x => x.items).collect()
// Array(Array(z), Array(x), Array(r), Array(s), Array(t), Array(y),
//   Array(p), Array(q))