了解问题的原因和解决方案。使用spark-submit时会出现问题。感谢帮助。
spark-submit --class "AuctionDataFrame" --master spark://<hostname>:7077 auction-project_2.11-1.0.jar
在spark-shell中逐行运行时不会导致错误。
...
scala> val auctionsDF = auctionsRDD.toDF()
auctionsDF: org.apache.spark.sql.DataFrame = [aucid: string, bid: float, bidtime: float, bidder: string, bidrate: int, openbid: float, price: float, itemtype: string, dtl: int]
scala> auctionsDF.printSchema()
root
|-- aucid: string (nullable = true)
|-- bid: float (nullable = false)
|-- bidtime: float (nullable = false)
|-- bidder: string (nullable = true)
|-- bidrate: integer (nullable = false)
|-- openbid: float (nullable = false)
|-- price: float (nullable = false)
|-- itemtype: string (nullable = true)
|-- dtl: integer (nullable = false)
调用toDF方法将RDD转换为DataFrame会导致错误。
Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
at AuctionDataFrame$.main(AuctionDataFrame.scala:52)
at AuctionDataFrame.main(AuctionDataFrame.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
case class Auctions(
aucid: String,
bid: Float,
bidtime: Float,
bidder: String,
bidrate: Int,
openbid: Float,
price: Float,
itemtype: String,
dtl: Int)
object AuctionDataFrame {
val AUCID = 0
val BID = 1
val BIDTIME = 2
val BIDDER = 3
val BIDRATE = 4
val OPENBID = 5
val PRICE = 6
val ITEMTYPE = 7
val DTL = 8
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("AuctionDataFrame")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val inputRDD = sc.textFile("/user/wynadmin/auctiondata.csv").map(_.split(","))
val auctionsRDD = inputRDD.map(a =>
Auctions(
a(AUCID),
a(BID).toFloat,
a(BIDTIME).toFloat,
a(BIDDER),
a(BIDRATE).toInt,
a(OPENBID).toFloat,
a(PRICE).toFloat,
a(ITEMTYPE),
a(DTL).toInt))
val auctionsDF = auctionsRDD.toDF() // <--- line 52 causing the error.
}
build.sbt
name := "Auction Project"
version := "1.0"
scalaVersion := "2.11.8"
//scalaVersion := "2.10.6"
/*
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.2",
"org.apache.spark" %% "spark-sql" % "1.6.2",
"org.apache.spark" %% "spark-mllib" % "1.6.2"
)
*/
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.2" % "provided",
"org.apache.spark" %% "spark-sql" % "1.6.2" % "provided",
"org.apache.spark" %% "spark-mllib" % "1.6.2" % "provided"
)
Spark on Ubuntu 14.04:
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.2
/_/
Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)
Windows上的sbt:
D:\>sbt sbtVersion
[info] Set current project to root (in build file:/D:/)
[info] 0.13.12
查看了类似的问题,这些问题表明编译Spark的Scala版本不兼容。
因此将build.sbt中的Scala版本更改为2.10,创建了2.10 jar,但错误仍然存在。使用%提供与否不会更改错误。
scalaVersion := "2.10.6"
答案 0 :(得分:0)
Spark 1.6.2是使用Scala 2.11从源文件编译的。然而,下载了spark-1.6.2-bin-without-hadoop.tgz并将其放在lib /目录中。
我相信因为spark-1.6.2-bin-without-hadoop.tgz已经使用Scala 2.10编译,导致兼容性问题。
从lib目录中删除spark-1.6.2-bin-without-hadoop.tgz并运行&#34; sbt package&#34;以下是库依赖项。
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.2" % "provided",
"org.apache.spark" %% "spark-sql" % "1.6.2" % "provided",
"org.apache.spark" %% "spark-mllib" % "1.6.2" % "provided"
)