请查看错误
scala> :load beginner_spark_ml.scala
Loading beginner_spark_ml.scala...
import scala.xml._
import org.apache.spark.sql.catalyst.plans._
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import org.apache.spark.ml.feature.{HashingTF, Tokenizer}
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
import org.apache.spark.ml.Pipeline
fileName: String = Posts.small.xml
textFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[8] at textFile at <console>:55
postsXml: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[12] at filter at <console>:60
postsRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[13] at map at <console>:59
schemaString: String = Id Tags Text
schema: org.apache.spark.sql.types.StructType = StructType(StructField(Id,StringType,true), StructField(Tags,Str
<console>:65: error: not found: value spark
val postsDf =spark.sqlContext.createDataFrame(postsRDD, schema)
^
targetTag: String = java
myudf: String => Double = <function1>
sqlfunc: org.apache.spark.sql.UserDefinedFunction = UserDefinedFunction(<function1>,DoubleType,List(StringType))
<console>:57: error: not found: value postsDf
val postsLabeled = postsDf.withColumn("Label", sqlfunc(col("Tags")) )
^
<console>:51: error: not found: value postsLabeled
val positive = postsLabeled.filter('Label > 0.0)
^
<console>:51: error: not found: value postsLabeled
val negative = postsLabeled.filter('Label < 1.0)
^
<console>:51: error: not found: value positive
val positiveTrain = positive.sample(false, 0.9)
^
<console>:51: error: not found: value negative
val negativeTrain = negative.sample(false, 0.9)
^
<console>:51: error: not found: value positiveTrain
val training = positiveTrain.unionAll(negativeTrain)
^
<console>:51: error: not found: value negativeTrain
val negativeTrainTmp = negativeTrain.withColumnRenamed("Label", "Flag").select('Id, 'Flag)
^
<console>:51: error: not found: value negative
val negativeTest = negative.join( negativeTrainTmp, negative("Id") === negativeTrainTmp("Id"), "LeftOuter
^
<console>:51: error: not found: value positiveTrain
val positiveTrainTmp = positiveTrain.withColumnRenamed("Label", "Flag").select('Id, 'Flag)
^
<console>:51: error: not found: value positive
val positiveTest = positive.join( positiveTrainTmp, positive("Id") === positiveTrainTmp("Id"), "LeftOuter
^
<console>:51: error: not found: value negativeTest
val testing = negativeTest.unionAll(positiveTest)
^
numFeatures: Int = 64000
numEpochs: Int = 30
regParam: Double = 0.02
tokenizer: org.apache.spark.ml.feature.Tokenizer = tok_9006f8c2defa
hashingTF: org.apache.spark.ml.feature.HashingTF = hashingTF_9b094ffdf5f6
lr: org.apache.spark.ml.classification.LogisticRegression = logreg_9a578b75908b
pipeline: org.apache.spark.ml.Pipeline = pipeline_8f437ded5dfe
<console>:65: error: not found: value training
val model = pipeline.fit(training)
^
testTitle: String = Easiest way to merge a release into one JAR file
testBody: String =
Is there a tool or script which easily merges a bunch of
href="http://en.wikipedia.org/wiki/JAR_%28file_format%29"
>JAR</a> files into one JAR file? A bonus would be to easily set the main-file manifest
and make it executable. I would like to run it with something like:
</p>

<blockquote>
 <p>java -jar
rst.jar</p>
</blockquote>

<p>
As far as I can tell, it has no dependencies which indicates that it shouldn't be an easy
single-file tool, but the downloaded ZIP file contains a lot of libraries.
testText: String =
Easiest way to merge a release into one JAR fileIs there a tool or script which easily merges a bunch of
href="http://en.wikipedia.org/wiki/JAR_%28file_format%29"
>JAR</a> files into one JAR file? A bonus would be to easily set the main-file manifest
and make it executable. I would like to run it with something like:
</p>

<blockquote>
 <p>java -jar
rst.jar</p>
</blockquote>

<p>
As far as I can tell, it has no dependencies which indicates that it shouldn't be an easy
single-file tool, but the downloaded ZIP file contains a lot of libraries.
<console>:57: error: not found: value sqlContext
val testDF = sqlContext.createDataFrame(Seq( (99.0, testText))).toDF("Label", "Text")
^
<console>:51: error: not found: value model
val result = model.transform(testDF)
^
<console>:51: error: not found: value result
val prediction = result.collect()(0)(6).asInstanceOf[Double]
^
<console>:52: error: not found: value prediction
print("Prediction: "+ prediction)
^
<console>:51: error: not found: value model
val testingResult = model.transform(testing)
^
<console>:51: error: not found: value testingResult
val testingResultScores = testingResult.select("Prediction", "Label").rdd.
^
<console>:51: error: not found: value testingResultScores
val bc = new BinaryClassificationMetrics(testingResultScores)
^
<console>:51: error: not found: value bc
val roc = bc.areaUnderROC
^
<console>:52: error: not found: value roc
print("Area under the ROC:" + roc)
^
scala>
scala> :load beginner_spark_ml.scala
Loading beginner_spark_ml.scala...
import scala.xml._
import org.apache.spark.sql.catalyst.plans._
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import org.apache.spark.ml.feature.{HashingTF, Tokenizer}
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
import org.apache.spark.ml.Pipeline
fileName: String = Posts.small.xml
textFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[15] at textFile at <console>:74
postsXml: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[19] at filter at <console>:79
postsRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[20] at map at <console>:78
schemaString: String = Id Tags Text
schema: org.apache.spark.sql.types.StructType = StructType(StructField(Id,StringType,true), StructField(Tags,Str
<console>:84: error: not found: value sqlContext
val postsDf =sqlContext.createDataFrame(postsRDD, schema)
^
targetTag: String = java
myudf: String => Double = <function1>
sqlfunc: org.apache.spark.sql.UserDefinedFunction = UserDefinedFunction(<function1>,DoubleType,List(StringType))
<console>:76: error: not found: value postsDf
val postsLabeled = postsDf.withColumn("Label", sqlfunc(col("Tags")) )
^
<console>:70: error: not found: value postsLabeled
val positive = postsLabeled.filter('Label > 0.0)
^
<console>:70: error: not found: value postsLabeled
val negative = postsLabeled.filter('Label < 1.0)
^
<console>:70: error: not found: value positive
val positiveTrain = positive.sample(false, 0.9)
^
<console>:70: error: not found: value negative
val negativeTrain = negative.sample(false, 0.9)
^
<console>:70: error: not found: value positiveTrain
val training = positiveTrain.unionAll(negativeTrain)
^
<console>:70: error: not found: value negativeTrain
val negativeTrainTmp = negativeTrain.withColumnRenamed("Label", "Flag").select('Id, 'Flag)
^
<console>:70: error: not found: value negative
val negativeTest = negative.join( negativeTrainTmp, negative("Id") === negativeTrainTmp("Id"), "LeftOuter
^
<console>:70: error: not found: value positiveTrain
val positiveTrainTmp = positiveTrain.withColumnRenamed("Label", "Flag").select('Id, 'Flag)
^
<console>:70: error: not found: value positive
val positiveTest = positive.join( positiveTrainTmp, positive("Id") === positiveTrainTmp("Id"), "LeftOuter
^
<console>:70: error: not found: value negativeTest
val testing = negativeTest.unionAll(positiveTest)
^
numFeatures: Int = 64000
numEpochs: Int = 30
regParam: Double = 0.02
tokenizer: org.apache.spark.ml.feature.Tokenizer = tok_d760dda17221
hashingTF: org.apache.spark.ml.feature.HashingTF = hashingTF_b8fff6458ec2
lr: org.apache.spark.ml.classification.LogisticRegression = logreg_28b7c8065eb6
pipeline: org.apache.spark.ml.Pipeline = pipeline_83ccdd93d410
<console>:84: error: not found: value training
val model = pipeline.fit(training)
^
testTitle: String = Easiest way to merge a release into one JAR file
testBody: String =
Is there a tool or script which easily merges a bunch of
href="http://en.wikipedia.org/wiki/JAR_%28file_format%29"
>JAR</a> files into one JAR file? A bonus would be to easily set the main-file manifest
and make it executable. I would like to run it with something like:
</p>

<blockquote>
 <p>java -jar
rst.jar</p>
</blockquote>

<p>
As far as I can tell, it has no dependencies which indicates that it shouldn't be an easy
single-file tool, but the downloaded ZIP file contains a lot of libraries.
testText: String =
Easiest way to merge a release into one JAR fileIs there a tool or script which easily merges a bunch of
href="http://en.wikipedia.org/wiki/JAR_%28file_format%29"
>JAR</a> files into one JAR file? A bonus would be to easily set the main-file manifest
and make it executable. I would like to run it with something like:
</p>

<blockquote>
 <p>java -jar
rst.jar</p>
</blockquote>

<p>
As far as I can tell, it has no dependencies which indicates that it shouldn't be an easy
single-file tool, but the downloaded ZIP file contains a lot of libraries.
<console>:76: error: not found: value sqlContext
val testDF = sqlContext.createDataFrame(Seq( (99.0, testText))).toDF("Label", "Text")
^
<console>:70: error: not found: value model
val result = model.transform(testDF)
^
<console>:70: error: not found: value result
val prediction = result.collect()(0)(6).asInstanceOf[Double]
^
<console>:71: error: not found: value prediction
print("Prediction: "+ prediction)
^
<console>:70: error: not found: value model
val testingResult = model.transform(testing)
^
<console>:70: error: not found: value testingResult
val testingResultScores = testingResult.select("Prediction", "Label").rdd.
^
<console>:70: error: not found: value testingResultScores
val bc = new BinaryClassificationMetrics(testingResultScores)
^
<console>:70: error: not found: value bc
val roc = bc.areaUnderROC
^
<console>:71: error: not found: value roc
print("Area under the ROC:" + roc)
^
scala>
答案 0 :(得分:1)
如果您使用的是Spark v1.5,则需要创建sparkContext变量。像这样:
val conf = new SparkConf().setAppName(appName).setMaster(master)
val sc = new SparkContext(conf)
请查看http://spark.apache.org/docs/1.5.0/programming-guide.html。如果你在Spark v1.5中使用spark-shell,那么使用变量&#34; sc&#34;不是&#34; spark&#34;(PS:spark变量代表SparkSession在v2.0 +中) 希望这会有所帮助。