从本地PC到远程集群

时间:2016-10-07 11:04:39

标签: scala apache-spark bigdata

我需要针对spark群集远程开发spark程序并运行它而不将其转换为jar,只需单击" Run" IDE中的按钮。但是我遇到了一些令人困惑的错误。

以下是代码:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "D:\\Lab\\ScalaIDE\\data\\README.md" // file resides in local windows PC
    val conf = new SparkConf().setAppName("Simple Application").setMaster("spark://172.31.110.234:7077")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

172.31.110.234是我的spark独立群集(Linux)。我从本地PC运行此代码(Windows,安装了ScalaIDE,IP:172.31.2.77)。

抱怨消息:

16/10/07 17:47:00 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

经过研究,建议在C:\ Bin中下载winutils.exe,然后我尝试在变量logFile上面添加这行代码:

System.setProperty("hadoop.home.dir", "C:\\");

现在我收到另一条错误消息,如下所示:

16/10/07 17:56:28 INFO SparkContext: Running Spark version 2.0.1
16/10/07 17:56:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
...
...
16/10/07 17:56:34 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.31.110.234): java.lang.ClassNotFoundException: org.bigdata.linknet.SimpleApp$$anonfun$1
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
...
...
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, 172.31.110.234): java.lang.ClassNotFoundException: org.bigdata.linknet.SimpleApp$$anonfun$1
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
...
...

问题:我的方案是否可以从我的PC上运行spark代码(只需单击" Run"按钮)到Spark Cluster? 我已阅读过类似的帖子Run Spark/Cloudera application in remote machine with Eclipse,但它似乎无法解决我的问题。

谢谢, Yusata

1 个答案:

答案 0 :(得分:0)

无法找到winutils二进制错误不是问题,您通常可以忽略它。

抛出上述异常是因为您的Spark集群没有您的类。

要实现您的目标,您需要:

  1. 构建jar(如果使用gradle - > fatJar或shadowJar)
  2. 在您的代码中,当您生成SparkConf时,您需要指定主地址和相对Jar位置,如:
  3. {{1}}