Question

注意： 作者正在寻找设置Spark Master的答案，以便在运行包含 no 更改源代码的Spark示例时设置Spark Master，而只选择选项如果可能的话，可以从命令行完成。

让我们考虑BinaryClassification示例的run（）方法：

  def run(params: Params) {
    val conf = new SparkConf().setAppName(s"BinaryClassification with $params")
    val sc = new SparkContext(conf)

请注意，SparkConf没有提供任何配置SparkMaster的方法。

使用以下参数从Intellij运行此程序时：

--algorithm LR --regType L2 --regParam 1.0 data/mllib/sample_binary_classification_data.txt

发生以下错误：

Exception in thread "main" org.apache.spark.SparkException: A master URL must be set
in your configuration
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:166)
    at org.apache.spark.examples.mllib.BinaryClassification$.run(BinaryClassification.scala:105)

我还尝试添加Spark Master网址（尽管代码似乎不支持它。）

  spark://10.213.39.125:17088   --algorithm LR --regType L2 --regParam 1.0 
  data/mllib/sample_binary_classification_data.txt

和

--algorithm LR --regType L2 --regParam 1.0 spark://10.213.39.125:17088
data/mllib/sample_binary_classification_data.txt

两者都不适用于错误：

Error: Unknown argument 'data/mllib/sample_binary_classification_data.txt'

这里的参考是解析选项 - 它对SparkMaster没有任何作用：

val parser = new OptionParser[Params]("BinaryClassification") {
  head("BinaryClassification: an example app for binary classification.")
  opt[Int]("numIterations")
    .text("number of iterations")
    .action((x, c) => c.copy(numIterations = x))
  opt[Double]("stepSize")
    .text(s"initial step size, default: ${defaultParams.stepSize}")
    .action((x, c) => c.copy(stepSize = x))
  opt[String]("algorithm")
    .text(s"algorithm (${Algorithm.values.mkString(",")}), " +
    s"default: ${defaultParams.algorithm}")
    .action((x, c) => c.copy(algorithm = Algorithm.withName(x)))
  opt[String]("regType")
    .text(s"regularization type (${RegType.values.mkString(",")}), " +
    s"default: ${defaultParams.regType}")
    .action((x, c) => c.copy(regType = RegType.withName(x)))
  opt[Double]("regParam")
    .text(s"regularization parameter, default: ${defaultParams.regParam}")
  arg[String]("<input>")
    .required()
    .text("input paths to labeled examples in LIBSVM format")
    .action((x, c) => c.copy(input = x))

所以......是的......我可以继续修改源代码。但我怀疑我错过了一个可用的调整旋钮来完成这项工作，但不涉及修改源代码。

Answer 1

您可以通过添加JVM参数来从命令行设置Spark master：

-Dspark.master=spark://myhost:7077

Answer 2

如果您希望通过代码完成此操作，则可以在创建.setMaster(...)时使用SparkConf：

val conf = new SparkConf().setAppName("Simple Application")
                          .setMaster("spark://myhost:7077")

姗姗来迟的EDIT （根据评论）

对于Spark 2.x +中的会话：

val spark = SparkSession.builder()
                        .appName("app_name")
                        .getOrCreate()

假设本地独立群集的命令行（2.x）。

spark-shell --master spark://localhost:7077

Answer 3

我下载了Spark 1.3.0并希望使用Eclipse Luna 4.4测试java示例并发现要运行java示例，您需要添加spark-assembly-1.3.0-hadoop2.4.0.jar作为引用的库到您的Java项目。

使用Java开始使用Spark的最快方法是运行JavaWordCount示例。要修复上述问题，请为Spark配置添加以下行：

SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount").setMaster("local[2]").set("spark.executor.memory","1g");

就是这样，尝试使用Eclipse运行你应该获得成功。如果您看到以下错误：

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)

忽略，向下滚动控制台，你会看到每行的输入文本文件行，然后是一个单词计数器。

这是使用Windows操作系统开始使用Spark的快速方法，无需担心安装Hadoop，只需要JDK 6和Eclipse

Answer 4

如文件所述： setMaster(String master)

要连接的主URL，例如local在本地运行一个线程，local[4]在本地运行4个核心，或spark://master:7077在Spark独立群集上运行。

Answer 5

所以这是解决方案。

默认设置为1个线程的本地

new SparkConf().setAppName("Ravi Macha").setMaster("local")

或者带参数（即括号中的线程数）

new SparkConf().setAppName("Ravi Macha").setMaster("local[2]")

如何从命令行为Spark示例设置主地址

5 个答案: