https://spark.apache.org/docs/2.1.0/mllib-frequent-pattern-mining.html#fp-growth
sample_fpgrowth.txt可以在这里找到, https://github.com/apache/spark/blob/master/data/mllib/sample_fpgrowth.txt
我在scala上面的链接中运行了FP-growth示例,它的工作正常,但我需要的是,如何将RDD中的结果转换为数据帧。 这两个RDD
model.freqItemsets and
model.generateAssociationRules(minConfidence)
详细解释我的问题中给出的例子。
答案 0 :(得分:3)
有private func createAuthenticationParameters() -> [String: Any] {
var parameters: [String: Any] = [:]
if let facebookID = User.sharedInstance.facebookID {
parameters["facebook_id"] = facebookID
} else if let email = User.sharedInstance.email {
parameters["email"] = email
}
if let token = User.sharedInstance.authToken {
parameters["auth_token"] = token
}
return parameters // LINE 313
}
后,有很多方法可以创建dataframe
。其中之一是使用rdd
函数,该函数需要.toDF
库为sqlContext.implicits
imported
之后,您阅读val sparkSession = SparkSession.builder().appName("udf testings")
.master("local")
.config("", "")
.getOrCreate()
val sc = sparkSession.sparkContext
val sqlContext = sparkSession.sqlContext
import sqlContext.implicits._
文本文件并转换为fpgrowth
rdd
我使用了问题中提供的Frequent Pattern Mining - RDD-based API代码
val data = sc.textFile("path to sample_fpgrowth.txt that you have used")
val transactions: RDD[Array[String]] = data.map(s => s.trim.split(' '))
下一步是调用val fpg = new FPGrowth()
.setMinSupport(0.2)
.setNumPartitions(10)
val model = fpg.run(transactions)
函数
第一个.toDF
dataframe
这将导致
model.freqItemsets.map(itemset =>(itemset.items.mkString("[", ",", "]") , itemset.freq)).toDF("items", "freq").show(false)
表示第二个+---------+----+
|items |freq|
+---------+----+
|[z] |5 |
|[x] |4 |
|[x,z] |3 |
|[y] |3 |
|[y,x] |3 |
|[y,x,z] |3 |
|[y,z] |3 |
|[r] |3 |
|[r,x] |2 |
|[r,z] |2 |
|[s] |3 |
|[s,y] |2 |
|[s,y,x] |2 |
|[s,y,x,z]|2 |
|[s,y,z] |2 |
|[s,x] |3 |
|[s,x,z] |2 |
|[s,z] |2 |
|[t] |3 |
|[t,y] |3 |
+---------+----+
only showing top 20 rows
dataframe
将导致
val minConfidence = 0.8
model.generateAssociationRules(minConfidence)
.map(rule =>(rule.antecedent.mkString("[", ",", "]"), rule.consequent.mkString("[", ",", "]"), rule.confidence))
.toDF("antecedent", "consequent", "confidence").show(false)
我希望这是你需要的