"value $不是StringContext的成员" - 缺少Scala插件?

时间:2017-05-26 20:32:13

“value $不是StringContext的成员”



import org.apache.spark.ml.evaluation.RegressionEvaluator
import org.apache.spark.ml.regression.LinearRegression
import org.apache.spark.ml.tuning.{ParamGridBuilder, TrainValidationSplit}
// To see less warnings
import org.apache.log4j._

// Start a simple Spark Session
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().getOrCreate()

// Prepare training and test data.
val data = spark.read.option("header","true").option("inferSchema","true").format("csv").load("USA_Housing.csv")

// Check out the Data

// See an example of what the data looks like
// by printing out a Row
val colnames = data.columns
val firstrow = data.head(1)(0)
println("Example Data Row")
for(ind <- Range(1,colnames.length)){

//// Setting Up DataFrame for Machine Learning ////

// A few things we need to do before Spark can accept the data!
// It needs to be in the form of two columns
// ("label","features")

// This will allow us to join multiple feature columns
// into a single column of an array of feautre values
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.linalg.Vectors

// Rename Price to label column for naming convention.
// Grab only numerical columns from the data
val df = data.select(data("Price").as("label"),$"Avg Area Income",$"Avg Area House Age",$"Avg Area Number of Rooms",$"Area Population")

// An assembler converts the input values to a vector
// A vector is what the ML algorithm reads to train a model

// Set the input columns from which we are supposed to read the values
// Set the name of the column where the vector will be stored
val assembler = new VectorAssembler().setInputCols(Array("Avg Area Income","Avg Area House Age","Avg Area Number of Rooms","Area Population")).setOutputCol("features")

// Use the assembler to transform our DataFrame to the two columns
val output = assembler.transform(df).select($"label",$"features")

// Create a Linear Regression Model object
val lr = new LinearRegression()

// Fit the model to the data

// Note: Later we will see why we should split
// the data first, but for now we will fit to all the data.
val lrModel = lr.fit(output)

// Print the coefficients and intercept for linear regression
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")

// Summarize the model over the training set and print out some metrics!
// Explore this in the spark-shell for more methods to call
val trainingSummary = lrModel.summary

println(s"numIterations: ${trainingSummary.totalIterations}")
println(s"objectiveHistory: ${trainingSummary.objectiveHistory.toList}")


println(s"RMSE: ${trainingSummary.rootMeanSquaredError}")
println(s"MSE: ${trainingSummary.meanSquaredError}")
println(s"r2: ${trainingSummary.r2}")


<project xmlns="http://maven.apache.org/POM/4.0.0" 

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <description>My wonderfull scala app</description>
      <name>My License</name>



    <!-- Test -->

        <!-- see http://davidb.github.com/scala-maven-plugin -->
          <!-- If you have classpath issue like NoDefClassError,... -->
          <!-- useManifestOnlyJar>false</useManifestOnlyJar -->


val spark = SparkSession.builder().getOrCreate()    
import spark.implicits._ // << add this

import org.apache.spark.sql.functions.col



  @Apurva's answer最初为我工作,因为错误从IntelliJ
  • 消失了
  但在"Could not find implicit value for spark"阶段
  期间导致sbt compile

我通过从spark.implicits._引用的SparkSession导入DataFrame而不是getOrCreate import df.sparkSession.implicits._ 来找到奇怪的变通办法>


其中DataFramecase class

这可能是因为我的代码放在收到implicit val spark: SparkSession参数的JavaPairInputDStream<String, String> notifications = KafkaUtils.createDirectStream(jssc, String.class, String.class, kafka.serializer.StringDecoder.class, kafka.serializer.StringDecoder.class, kafkaParams, topicSet); 内;但是我不确定为什么这个修复对我有用

我正在使用spark 1.6。上面的答案很好,但是不幸的是在1.6中不起作用

我解决问题的方法是使用df.col(“ column-name”)

val df = df_mid
         .withColumn("dt", date_format(df_mid.col("timestamp"), "yyyy-MM-dd"))
         .filter("dt != 'null'")