spark.exe提交中忽略了spark.executor.extraJavaOptions

时间:2018-04-24 12:43:43

标签: apache-spark hadoop

我是一个试图描述当地火花工作的新手。 这是我正在尝试执行的命令,但是我收到一条警告,指出我的执行程序选项被忽略,因为它们是非spark配置属性。

错误:

  

警告:忽略非spark配置属性:“spark.executor.extraJavaOptions = javaagent:statsd-jvm-profiler-2.1.0-jar-with-dependencies.jar = server = localhost,port = 8086,reporter = InfluxDBReporter ,数据库=探查,用户名=探查,密码=探查,前缀= MyNamespace.MySparkApplication,tagMapping = namespace.application”

命令:

  

./ bin / spark-submit --master local [2] --class org.apache.spark.examples.GroupByTest --conf“spark.executor.extraJavaOptions = -javaagent:statsd-jvm-profiler-2.1。 0-jar-with-dependencies.jar = server = localhost,port = 8086,reporter = InfluxDBReporter,database = profiler,username = profiler,password = profiler,prefix = MyNamespace.MySparkApplication,tagMapping = namespace.application“--name HdfsWordCount --jars /Users/shprin/statD/statsd-jvm-profiler-2.1.0-jar-with-dependencies.jar libexec / examples / jars / spark-examples_2.11-2.3.0.jar

Spark版本:2.0.3

请让我知道,如何解决这个问题。

先谢谢。

2 个答案:

答案 0 :(得分:1)

我认为问题是你用来指定spark.executor.extraJavaOptions的双引号。它应该是单引号。

  

./ bin / spark-submit --master local [2] --conf'spark.executor.extraJavaOptions = -javaagent:statsd-jvm-profiler-2.1.0-jar-with-dependencies.jar = server = localhost,port = 8086,reporter = InfluxDBReporter,database = profiler,username = profiler,password = profiler,prefix = MyNamespace.MySparkApplication,tagMapping = namespace.application'-class org.apache.spark.examples.GroupByTest --name HdfsWordCount --jars /Users/shprin/statD/statsd-jvm-profiler-2.1.0-jar-with-dependencies.jar libexec / examples / jars / spark-examples_2.11-2.3.0.jar

答案 1 :(得分:0)

除上述答案外,如果您的参数同时包含空格和单引号(例如查询参数),则应将其括在转义的双引号\“

示例:

import pandas as pd 
import numpy as np 
from sklearn.metrics import multilabel_confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report

#reading the input file
df = pd.read_excel('C:\\testsamples.xlsx')

#Converting the columns to array
actual = df['Actual'].to_numpy()
predicted = df['Predicted'].to_numpy()


mcm = multilabel_confusion_matrix(actual, predicted)