我写了一段机器学习代码,它在Scala shell上运行得很好。我正在使用SBT编译代码并创建JAR。我使用了示例中的一些代码(例如Spark中的LocalLR和SparkPI)来尝试编译新项目文件夹中的代码。它们都编译成功但由于某些原因我的代码没有编译。我遵循所有目录约定但仍未成功。
Action
以下错误
import org.apache.spark.SparkContext
import org.apache.spark.mllib.evaluation._
import org.apache.spark.mllib.tree._
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.tree.model._
import org.apache.spark.rdd._
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.mllib.classification.LogisticRegressionModel
object PredictOOS {
def getMetrics(model: DecisionTreeModel, data: RDD[LabeledPoint]):
MulticlassMetrics = {
val predictionsAndLabels = data.map(example =>
(model.predict(example.features), example.label)
)
new MulticlassMetrics(predictionsAndLabels)
}
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Predict OOS")
val spark = new SparkContext(conf)
val data = spark.textFile("D:/data/g1-svm.csv")
val parsedData = data.map { line =>
val parts = line.split(',').map(_.toDouble)
LabeledPoint(parts(0), Vectors.dense(parts.tail))
}
val splits = parsedData.randomSplit(Array(0.8, 0.2), seed = 11L)
val training = splits(0).cache()
val test = splits(1)
val model = DecisionTree.trainClassifier(training, 2, Map[Int,Int] (), "gini", 20, 300)
val metrics = getMetrics(model, test)
println(" confusionMatrix is generated")
spark.stop()
}
}
请告知我是否遗漏了任何东西。我很长时间都被困在这个编译部分..非常感谢任何帮助
这是对原始帖子的修改。上面的代码编译成功但在我将输出写入文件时失败了。
D:\ScalaApps\sparklr>cd ../oos
D:\ScalaApps\oos>sbt
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; sup
port was removed in 8.0
[info] Set current project to Proj_oos (in build file:/D:/ScalaApps/oos/)
> compile
[info] Compiling 1 Scala source to D:\ScalaApps\oos\target\scala-2.11\classes...
[error] D:\ScalaApps\oos\src\main\scala\oos.scala:5: not found: type MulticlassM
etrics
[error] MulticlassMetrics = {
[error] ^
[error] D:\ScalaApps\oos\src\main\scala\oos.scala:4: not found: type DecisionTre
eModel
[error] def getMetrics(model: DecisionTreeModel, data: RDD[Label
edPoint]):
[error] ^
[error] D:\ScalaApps\oos\src\main\scala\oos.scala:4: not found: type RDD
[error] def getMetrics(model: DecisionTreeModel, data: RDD[Label
edPoint]):
[error] ^
[error] D:\ScalaApps\oos\src\main\scala\oos.scala:9: not found: type MulticlassM
etrics
[error] new MulticlassMetrics(predictionsAndLabels)
[error] ^
[error] D:\ScalaApps\oos\src\main\scala\oos.scala:19: not found: value LabeledPo
int
[error] LabeledPoint(parts(0), Vectors.dense(parts.tail)
)
[error] ^
[error] D:\ScalaApps\oos\src\main\scala\oos.scala:19: not found: value Vectors
[error] LabeledPoint(parts(0), Vectors.dense(parts.tail)
)
[error] ^
[error] D:\ScalaApps\oos\src\main\scala\oos.scala:25: not found: value DecisionT
ree
[error] val model = DecisionTree.trainClassifier(trainin
g, 2, Map[Int,Int](), "gini", 20, 300)
[error] ^
[error] 7 errors found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 5 s, completed Dec 4, 2015 10:39:22 PM
>
错误
metrics.confusionMatrix.saveAsTextFile("D:/spark4/confMatrix2")
我需要导入另一个包,以便saveAsTextFile工作吗?
答案 0 :(得分:0)
您应该在build.sbt中添加以下依赖项:
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "1.4.0"
在您的scala文件中添加以下导入:
import org.apache.spark.{SparkConf, SparkContext}
希望这有帮助
答案 1 :(得分:0)
我已经解决了这个问题。谢谢你的时间。
metrics.confusionMatrix.saveAsTextFile("D:/spark4/confMatrix2")
即使在控制台上也可以使用。相反,我必须执行以下操作来保存结果。
val res = metrics.confusionMatrix.toArray
val res1 = spark.parallelize(res)
res1.coalesce(1).saveAsTextFile("D:/spark4/confmatrix2")