所以我最近得到了一本macbook,想要学习Spark和Scala。我在网上浏览了一些关于如何安装Scala,Hadoop,Spark的指南,因为我想尝试一个新的IDE我安装了Intellij。
我一直在遇到这个问题。
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/03/19 18:07:38 INFO SparkContext: Running Spark version 2.3.0
18/03/19 18:07:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/03/19 18:07:38 INFO SparkContext: Submitted application: Spark Count
18/03/19 18:07:38 INFO SecurityManager: Changing view acls to: jeanmac
18/03/19 18:07:38 INFO SecurityManager: Changing modify acls to: jeanmac
18/03/19 18:07:38 INFO SecurityManager: Changing view acls groups to:
18/03/19 18:07:38 INFO SecurityManager: Changing modify acls groups to:
18/03/19 18:07:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jeanmac); groups with view permissions: Set(); users with modify permissions: Set(jeanmac); groups with modify permissions: Set()
18/03/19 18:07:39 INFO Utils: Successfully started service 'sparkDriver' on port 61094.
18/03/19 18:07:39 INFO SparkEnv: Registering MapOutputTracker
18/03/19 18:07:39 INFO SparkEnv: Registering BlockManagerMaster
18/03/19 18:07:39 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/03/19 18:07:39 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/03/19 18:07:39 INFO DiskBlockManager: Created local directory at /private/var/folders/r5/rfwd1cqd4kv8cmh5gh_qxpvm0000gn/T/blockmgr-c8a5c1ac-8e09-4352-928e-1169a96cd752
18/03/19 18:07:39 INFO MemoryStore: MemoryStore started with capacity 2004.6 MB
18/03/19 18:07:39 INFO SparkEnv: Registering OutputCommitCoordinator
18/03/19 18:07:39 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/03/19 18:07:39 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://jeanss-mbp:4040
18/03/19 18:07:39 INFO Executor: Starting executor ID driver on host localhost
18/03/19 18:07:39 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 61095.
18/03/19 18:07:39 INFO NettyBlockTransferService: Server created on jeanss-mbp:61095
18/03/19 18:07:39 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/03/19 18:07:39 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, jeanss-mbp, 61095, None)
18/03/19 18:07:39 INFO BlockManagerMasterEndpoint: Registering block manager jeanss-mbp:61095 with 2004.6 MB RAM, BlockManagerId(driver, jeanss-mbp, 61095, None)
18/03/19 18:07:39 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, jeanss-mbp, 61095, None)
18/03/19 18:07:39 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, jeanss-mbp, 61095, None)
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at scala.ScalaWordCount$.main(ScalaWordCount.scala:10)
at scala.ScalaWordCount.main(ScalaWordCount.scala)
18/03/19 18:07:40 INFO SparkContext: Invoking stop() from shutdown hook
18/03/19 18:07:40 INFO SparkUI: Stopped Spark web UI at http://jeanss-mbp:4040
18/03/19 18:07:40 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/03/19 18:07:40 INFO MemoryStore: MemoryStore cleared
18/03/19 18:07:40 INFO BlockManager: BlockManager stopped
18/03/19 18:07:40 INFO BlockManagerMaster: BlockManagerMaster stopped
18/03/19 18:07:40 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/03/19 18:07:40 INFO SparkContext: Successfully stopped SparkContext
18/03/19 18:07:40 INFO ShutdownHookManager: Shutdown hook called
18/03/19 18:07:40 INFO ShutdownHookManager: Deleting directory /private/var/folders/r5/rfwd1cqd4kv8cmh5gh_qxpvm0000gn/T/spark-331c36a6-b985-4056-900e-88250052ebb3
从这一切中突出的是这个错误:
WARN NativeCodeLoader Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
我做了一些谷歌搜索并尝试了一些不同的结果,但我似乎无法让我的程序运行。
除了那个错误之外,我还查看了异常错误:
18/03/19 18:07:39 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, jeanss-mbp, 61095, None)
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at scala.ScalaWordCount$.main(ScalaWordCount.scala:10)
at scala.ScalaWordCount.main(ScalaWordCount.scala)
第10行:val threshold = args(1).toInt
所以我正在寻求有关如何正确解决此问题的帮助。我将在下面提供我的系统和ide配置。
版本:
Intellij配置:
WordCount.scala:
package scala
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object ScalaWordCount {
def main(args: Array[String]) {
val sc = new SparkContext(new SparkConf().setAppName("Spark Count").setMaster("local[2]"))
val threshold = args(1).toInt
// split each document into words
val tokenized = sc.textFile(args(0)).flatMap(_.split(" "))
// count the occurrence of each word
val wordCounts = tokenized.map((_, 1)).reduceByKey(_ + _)
// filter out words with less than threshold occurrences
val filtered = wordCounts.filter(_._2 >= threshold)
// count characters
val charCounts = filtered.flatMap(_._1.toCharArray).map((_, 1)).reduceByKey(_ + _)
System.out.println(charCounts.collect().mkString(", "))
}
}
build.sbt:
name := "untitled"
version := "0.1"
scalaVersion := "2.11.8"
val sparkVersion = "2.3.0"
resolvers ++= Seq(
"apache-snapshots" at "http://repository.apache.org/snapshots/"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-mllib" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-hive" % sparkVersion
)
我不太精通scala和spark,所以它可能只是我的代码而不是我的环境有问题。但是,如果您有任何故障排除步骤,您认为我应该采取让我知道。如果您需要任何其他配置,我很乐意与他们一起更新这篇文章。