我附加了错误代码片段"值toDF不是org.apache.spark.rdd.RDD"的成员。我使用的是scala 2.11.8和spark 2.0.0。 你能帮我解决API toDF()的问题吗?
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.functions._
object HHService {
case class Services(
uhid:String,
locationid:String,
doctorid:String,
billdate:String,
servicename:String,
servicequantity:String,
starttime:String,
endtime:String,
servicetype:String,
servicecategory:String,
deptname:String
)
def toService = (p: Seq[String]) => Services(p(0), p(1),p(2),p(3),p(4),p(5),p(6),p(7),p(8),p(9),p(10))
def main(args: Array[String]){
val warehouseLocation = "file:${system:user.dir}/spark-warehouse"
val spark = SparkSession
.builder
.appName(getClass.getSimpleName)
.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate()
val sc = spark.sparkContext
val sqlContext = spark.sqlContext;
import spark.implicits._
import sqlContext.implicits._
val hospitalDataText = sc.textFile("D:/Books/bboks/spark/Intellipaat/Download/SparkHH/SparkHH/services.csv")
val header = hospitalDataText.first()
val hospitalData= hospitalDataText.filter(a => a!= header)
//val HData = hospitalData.map(_.split(",")).map(p=>Services(p(0), p(1),p(2),p(3),p(4),p(5),p(6),p(7),p(8),p(9),p(10)))
val HData = hospitalData.map(_.split(",")).map(toService(_))
val hosService=HData.toDF()
}
}
答案 0 :(得分:4)
1]需要获取sqlContext,如下所示。
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
这解决了我的问题。下面的代码片段下面用于获取sqlcontext。 val sqlContext = spark.sqlContext (这种方式适用于spark-shell)
2] 案例类需要不在方法中。大多数博客也提到了这一点。
答案 1 :(得分:0)
在DataBricks中使用笔记本将我的代码转换为简单函数也遇到了同样的问题。必须从该函数中取消声明该类,并且一切运行良好:
%scala
case class className(param1 : String,
param2 : String,
...
lastoaram : Double)
def myFunction(params) = {
a lot of code
...
var myVarBasedOnClasseDefinition = Seq(myVarBasedOnClasseDefinition ("init","init","init",0.0,0.0,"init",0.0))
for(iteration <- iterator) myVarBasedOnClasseDefinition = myVarBasedOnClasseDefinition ++ additionnalSequence
display(myVarBasedOnClasseDefinition.toDF())
}
希望这会有所帮助,因为在我的搜索开始时“案例类需要脱离方法”这句话似乎并没有真正适用于我的案例,并且使用类似过程的代码都可以正常工作。