Spark:创建DataFrame会产生异常

时间:2016-10-18 09:48:52

标签: scala apache-spark spark-dataframe k-means

我正在尝试使用spark sqlContext创建DataFrame。我使用了spark 1.6.3和scala 2.10.5。以下是我创建DataFrames的代码。

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
import com.knoldus.pipeline.KMeansPipeLine

object SimpleApp{

    def main(args:Array[String]){

    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)

    import sqlContext.implicits._

    val kMeans = new KMeansPipeLine()
     val df = sqlContext.createDataFrame(Seq(
        ("a@email.com", 12000,"M"),
        ("b@email.com", 43000,"M"),
        ("c@email.com", 5000,"F"),
        ("d@email.com", 60000,"M")
      )).toDF("email", "income","gender")

    val categoricalFeatures = List("gender","email")
    val numberOfClusters = 2
    val iterations = 10
    val predictionResult = kMeans.predict(sqlContext,df,categoricalFeatures,numberOfClusters,iterations)
   }
}

它给了我以下例外。我在做什么错?任何人都可以帮我解决这个问题吗?

 Exception in thread "main" java.lang.NoSuchMethodError:
    org.apache.spark.sql.SQLContext.createDataFrame(Lscala/collection/Seq;Lscala/ref lect/api/TypeTags$TypeTag;)Lorg/apache/spark/sql/Dataset;
    at SimpleApp$.main(SimpleApp.scala:24)
    at SimpleApp.main(SimpleApp.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

我使用的依赖项是:

scalaVersion := "2.10.5" 
libraryDependencies ++= Seq( 
 "org.apache.spark" % "spark-core_2.10" % "2.0.0" % "provided", 
 "org.apache.spark" % "spark-sql_2.10" % "2.0.0" % "provided", 
 "org.apache.spark" % "spark-mllib_2.10" % "2.0.0" % "provided", 
 "knoldus" % "k-means-pipeline" % "0.0.1" )

1 个答案:

答案 0 :(得分:1)

正如我在你的createDataFrame中看到的错过了第二个参数。这里描述的方法模式: https://spark.apache.org/docs/1.6.1/api/scala/index.html#org.apache.spark.sql.SQLContext@createDataFrame(org.apache.spark.api.java.JavaRDD,%20java.lang.Class)

在你的情况下,它将是

$('#aktivita tr').each(function(){
  var first = $(this).find("td:nth-of-type(2)").text();
  var second = $(this).find("td:nth-of-type(3)").text();
  var third = $(this).find("td:nth-of-type(4)").text();


  console.log(first);
  console.log(second);
  console.log(third);
});
  

:: Experimental ::从本地Seq of Product创建一个DataFrame。

OR 将Seq转换为List / RDD并使用带有2个参数的方法模式