Scala / Spark中的类导入错误

时间:2017-03-28 02:58:34

标签: scala apache-spark

我是Spark新手,我正在使用Scala。我写了一个简单的object,使用spark-shell:load test.scala中加载得很好。

import org.apache.spark.ml.feature.StringIndexer

object Collaborative{
    def trainModel() ={
        val data = sc.textFile("/user/PT/data/newfav.csv")
        val df = data.map(_.split(",") match {
            case Array(user,food,fav) => (user,food,fav.toDouble)
        }).toDF("userID","foodID","favorite")
        val userIndexer = new StringIndexer().setInputCol("userID").setOutputCol("userIndex")
    }
}

现在我想把它放在一个传递参数的类中。我使用与class相同的代码。

import org.apache.spark.ml.feature.StringIndexer

class Collaborative{
    def trainModel() ={
        val data = sc.textFile("/user/PT/data/newfav.csv")
        val df = data.map(_.split(",") match {
            case Array(user,food,fav) => (user,food,fav.toDouble)
        }).toDF("userID","foodID","favorite")
        val userIndexer = new StringIndexer().setInputCol("userID").setOutputCol("userIndex")
    }
}

这会返回导入错误。

<console>:19: error: value toDF is not a member of org.apache.spark.rdd.RDD[(String, String, Double)]
           val df = data.map(_.split(",") match { case Array(user,food,fav) => (user,food,fav.toDouble) }).toDF("userID","foodID","favorite")

<console>:24: error: not found: type StringIndexer
           val userIndexer = new StringIndexer().setInputCol("userID").setOutputCol("userIndex")

我在这里缺少什么?

1 个答案:

答案 0 :(得分:0)

尝试这个,这个似乎工作正常。

def trainModel() ={
    val spark = SparkSession.builder().appName("test").master("local").getOrCreate()
    import spark.implicits._ 
    val data = spark.read.textFile("/user/PT/data/newfav.csv")
    val df = data.map(_.split(",") match {
      case Array(user,food,fav) => (user,food,fav.toDouble)
    }).toDF("userID","foodID","favorite")
    val userIndexer = new StringIndexer().setInputCol("userID").setOutputCol("userIndex")
  }