Question

我正在从这里复制并粘贴确切的Spark MLlib LDA示例：http://spark.apache.org/docs/latest/mllib-clustering.html#latent-dirichlet-allocation-lda

我正在尝试使用Scala示例代码，但在尝试保存和加载LDA模型时遇到以下错误：

在最后一行之前的行：value saveis not a member is not a member of org.apach.spark.mllib.clustering.DistributedLDAModel
在最后一行：not found: value DistributedLDAModel

以下是代码，知道我正在使用SBT创建我的Scala项目框架并加载库然后我将其导入Eclipse（Mars）进行编辑，我正在使用spark-core 1.5.0和{{1} }和spark-mllib 1.3.1

Scala version 2.11.7

Answer 1

首先，代码编译得很好。我用于设置的事情：

./ build.sbt

name := "SO_20150917"

version := "1.0"

scalaVersion := "2.11.7"

libraryDependencies ++= Seq(
  "org.apache.spark"     %% "spark-core"    % "1.5.0",
  "org.apache.spark"     %% "spark-mllib"   % "1.5.0"
)

./ SRC /主/阶/ somefun /

package somefun

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.mllib.clustering.{LDA, DistributedLDAModel}
import org.apache.spark.mllib.linalg.Vectors

object Example {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("sample_SBT").setMaster("local[2]")
    val sc = new SparkContext(conf)
    // Load and parse the data
    val data = sc.textFile("data/mllib/sample_lda_data.txt")
    val parsedData = data.map(s => Vectors.dense(s.trim.split(' ').map(_.toDouble)))
    // Index documents with unique IDs
    val corpus = parsedData.zipWithIndex.map(_.swap).cache()

    // Cluster the documents into three topics using LDA
    val ldaModel = new LDA().setK(3).run(corpus)

    // Output topics. Each is a distribution over words (matching word count vectors)
    println("Learned topics (as distributions over vocab of " + ldaModel.vocabSize + " words):")
    val topics = ldaModel.topicsMatrix
    for (topic <- Range(0, 3)) {
      print("Topic " + topic + ":")
      for (word <- Range(0, ldaModel.vocabSize)) { print(" " + topics(word, topic)); }
      println()
    }

    // Save and load model.
    ldaModel.save(sc, "myLDAModel")
    val sameModel = DistributedLDAModel.load(sc, "myLDAModel")
  }
}

通过sbt run执行（当然）会因为缺少＆＃34; data / mllib / sample_lda_data.txt＆＃34;而咆哮。如

[error] (run-main-0) org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/martin/IdeaProjects/SO_20150917/data/mllib/sample_lda_data.txt

@Rami：因此，请检查您的设置，因为从我的观点来看，一切都没问题。

Answer 2

关于@Rami的问题：

也许这会有所帮助：

val sparkVersion = "1.5.0"

libraryDependencies ++= Seq(
  "org.apache.spark"     %% "spark-core"    % sparkVersion,
  "org.apache.spark"     %% "spark-mllib"   % sparkVersion
)

如果混合使用不同版本的Spark-core和Spark-mllib，编译错误的原因是什么？

2 个答案:

./ build.sbt

./ SRC /主/阶/ somefun /