运行方法(JavaRDD <sequence>)不适用于参数(JavaRDD <list <string>&gt;)

时间:2016-11-28 22:06:25

标签: java apache-spark spark-streaming

当试图在spark mllib中执行Prefixspan算法时,我收到错误

  

PrefixSpan类型中的方法run(JavaRDD Sequence)不适用于参数(JavaRDD List String)

我在网站上看到的代码是

JavaRDD<List<List<Integer>>> sequences = sc.parallelize(Arrays.asList(
Arrays.asList(Arrays.asList(1, 2), Arrays.asList(3)),
Arrays.asList(Arrays.asList(1), Arrays.asList(3, 2), Arrays.asList(1, 2)),
Arrays.asList(Arrays.asList(1, 2), Arrays.asList(5)),
Arrays.asList(Arrays.asList(6))), 2);
PrefixSpan prefixSpan = new PrefixSpan().setMinSupport(0.5).setMaxPatternLength(5);
PrefixSpanModel<Integer> model = prefixSpan.run(sequences);
for (PrefixSpan.FreqSequence<Integer> freqSeq: model.freqSequences().toJavaRDD().collect()) {
     System.out.println(freqSeq.javaSequence() + ", " + freqSeq.freq());
}

我的代码是

List<List<String>> sequences = createLists(featuresForAlgo);

JavaRDD<List<String>> rdd =  sc.parallelize(sequences);

PrefixSpan prefixSpan = new PrefixSpan()
          .setMinSupport(0.5)
          .setMaxPatternLength(5);
        PrefixSpanModel<String> model = prefixSpan.run(rdd);
        for (PrefixSpan.FreqSequence<Integer> freqSeq: model.freqSequences().toJavaRDD().collect()) {
          System.out.println(freqSeq.javaSequence() + ", " + freqSeq.freq());
        }

方法prefixSpan.run(rdd)给出错误。 知道为什么我会收到这个错误吗? 据我所知,List是一个序列。

由于

1 个答案:

答案 0 :(得分:0)

错误有点误导,但是如果你看到PrefixSpan类的源代码,你会发现run方法参数就像

  

@param数据有序的项目集序列,存储为Iterables的Java Iterable

所以prefixSpan.run方法需要df <- data.frame(id = c(1,2,3,1,2), var1_dose = c(2,4,6,1,3), var1_unit = c("mL","mg","mcg","mL","mL"), var2_dose = c(5,2,4,1,3), var2_unit = c("mL","mg","mcg","mL","mL"), var3_dose = c(1,4,2,3,5), var3_unit = c("mL","mg","mcg","mL","mL")) dose_list <- lapply(seq(2,ncol(df)-1,2), function(x) paste0(df[, x],df[, x + 1])) names(dose_list) <- c(paste0("dose_",seq(1:(ncol(df) / 2)))) 。在您的代码中,您可以这样做

JavaRDD<List<List<String>>>