如何保存FP-Growth模型FrequentItemSet导致文本文件?

时间:2016-05-30 06:06:58

标签: java apache-spark apache-spark-mllib

我正在尝试将模型生成的频繁项目集保存到文本文件中。该代码是Spark ML库中FPGrowth示例的示例。 直接在模型上使用saveAsTextFile会写入RDD位置而不是实际值。

import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.SparkConf;
import org.apache.spark.mllib.fpm.FPGrowth;
import org.apache.spark.mllib.fpm.FPGrowthModel;
import org.apache.spark.api.java.function.Function;
import java.util.Arrays;
import java.util.List;

public class Test_ItemFrequency {

    public static void main(String args[]) {

        SparkConf conf = new SparkConf().setAppName("FP-Growth_ItemFrequency").setMaster("local");
        JavaSparkContext sc = new JavaSparkContext(conf);

        JavaRDD<String> data = sc.textFile("/data/mllib/sample_fpgrowth.txt");

        JavaRDD<List<String>> transactions = data.map(new Function<String, List<String>>() {
            public List<String> call(String line) {
                String[] parts = line.split(" ");
                return Arrays.asList(parts);
            }
        });

        FPGrowth fpg = new FPGrowth().setMinSupport(0.2).setNumPartitions(1);
        FPGrowthModel<String> model = fpg.run(transactions);

        model.freqItemsets().saveAsTextFile("/home/data/itemset");

        sc.stop();
    }
}

文本文件中生成的输出类似于

org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@754881de
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@73022909
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@25df2591
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@774b6aca
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@100ba1db
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@72a388b2
org.apache.spark.mllib.fpm.FPGrowth$FreqItemset@2e8cc8da

任何人都可以解释如何修复?在此先感谢。

1 个答案:

答案 0 :(得分:1)

使用lambda表达式

model.freqItemsets()
     .toJavaRDD()
     .map((Function<FPGrowth.FreqItemset<String>, String>) fi -> fi.javaItems() + " -> " + fi.freq())
     .saveAsTextFile("/home/data/itemset");

我们将FPGrowth.FreqItemSet转换为JavaRDD<String>,以便我们可以在之后保存。

解决方案没有lambda表达式

model.freqItemsets()
     .toJavaRDD()
     .map(new Function<FPGrowth.FreqItemset<String>, String>() {
            @Override
            public String call(FPGrowth.FreqItemset<String> fi) {
                return fi.javaItems() + " -> " + fi.freq();
            }
        }
     ).saveAsTextFile("/home/data/itemset");