toDF问题,值toDF不是org.apache.spark.rdd.RDD的成员

时间:2017-03-11 11:33:46

标签: dataframe apache-spark-sql

我附加了错误代码片段"值toDF不是org.apache.spark.rdd.RDD"的成员。我使用的是scala 2.11.8和spark 2.0.0。 你能帮我解决API toDF()的问题吗?

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.functions._

object HHService {
    case class Services(
    uhid:String,
    locationid:String,
    doctorid:String,
    billdate:String,
    servicename:String,
    servicequantity:String,
    starttime:String,
    endtime:String,
    servicetype:String,
    servicecategory:String,
    deptname:String
    )

    def toService = (p: Seq[String]) => Services(p(0), p(1),p(2),p(3),p(4),p(5),p(6),p(7),p(8),p(9),p(10))

    def main(args: Array[String]){
        val warehouseLocation = "file:${system:user.dir}/spark-warehouse"
        val spark = SparkSession
            .builder
            .appName(getClass.getSimpleName)
            .config("spark.sql.warehouse.dir", warehouseLocation)
        .enableHiveSupport()
            .getOrCreate()
        val sc = spark.sparkContext 

        val sqlContext = spark.sqlContext;

        import spark.implicits._
        import sqlContext.implicits._

        val hospitalDataText = sc.textFile("D:/Books/bboks/spark/Intellipaat/Download/SparkHH/SparkHH/services.csv")
        val header = hospitalDataText.first()
        val hospitalData= hospitalDataText.filter(a => a!= header)
        //val HData = hospitalData.map(_.split(",")).map(p=>Services(p(0), p(1),p(2),p(3),p(4),p(5),p(6),p(7),p(8),p(9),p(10)))
        val HData = hospitalData.map(_.split(",")).map(toService(_))

        val hosService=HData.toDF()
    }

}

2 个答案:

答案 0 :(得分:4)

1]需要获取sqlContext,如下所示。

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

这解决了我的问题。下面的代码片段下面用于获取sqlcontext。 val sqlContext = spark.sqlContext (这种方式适用于spark-shell)

2] 案例类需要不在方法中。大多数博客也提到了这一点。

答案 1 :(得分:0)

在DataBricks中使用笔记本将我的代码转换为简单函数也遇到了同样的问题。必须从该函数中取消声明该类,并且一切运行良好:

%scala

case class className(param1 : String,
                      param2 : String,
                      ...
                      lastoaram : Double)

def myFunction(params) = {
a lot of code 
...
var myVarBasedOnClasseDefinition = Seq(myVarBasedOnClasseDefinition ("init","init","init",0.0,0.0,"init",0.0))
for(iteration <- iterator) myVarBasedOnClasseDefinition = myVarBasedOnClasseDefinition ++ additionnalSequence
display(myVarBasedOnClasseDefinition.toDF())
}

希望这会有所帮助,因为在我的搜索开始时“案例类需要脱离方法”这句话似乎并没有真正适用于我的案例,并且使用类似过程的代码都可以正常工作。