我创建了一个程序来在cassandra上选择文本数据。 这是我的代码。 只需选择所有数据并将其显示在控制台中即可。
def get_spark_context(app_name, max_cores=120):
# checkpointDirectory = ""
conf = SparkConf().setMaster(local_settings.SPARK_MASTER).setAppName(app_name) \
.set("spark.cores.max", max_cores)\
.set("spark.jars.packages", "datastax:spark-cassandra-connector:2.0.0-s_2.11") \
.set("spark.cassandra.connection.host", local_settings.CASSANDRA_MASTER)
# setup spark context
sc = SparkContext.getOrCreate(conf=conf)
sc.setCheckpointDir(local_settings.CHECKPOINT_DIRECTORY)
return sc
def get_sql_context(sc):
sqlc = SQLContext.getOrCreate(sc)
return sqlc
def run():
sc = get_spark_context("Select data")
sql_context = get_sql_context(sc)
sql_context.read.format("org.apache.spark.sql.cassandra") \
.options(table="text", keyspace="data") \
.load().show()
但是控制台会这样显示。它粘在日志中:
初始工作未接受任何资源;检查您的群集用户界面,以确保工作人员已注册并拥有足够的资源
它永远不会结束。
19/02/21 09:09:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/02/21 09:09:23 WARN Utils: Your hostname, osboxes resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface eth0)
19/02/21 09:09:23 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/02/21 09:09:44 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
19/02/21 09:09:59 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
因此,我检查了我的火花工人的日志。 错误日志如下
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/02/21 08:58:18 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 15264@mm_h01
19/02/21 08:58:18 INFO SignalUtils: Registered signal handler for TERM
19/02/21 08:58:18 INFO SignalUtils: Registered signal handler for HUP
19/02/21 08:58:18 INFO SignalUtils: Registered signal handler for INT
19/02/21 08:58:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/02/21 08:58:19 INFO SecurityManager: Changing view acls to: hadoop,osboxes
19/02/21 08:58:19 INFO SecurityManager: Changing modify acls to: hadoop,osboxes
19/02/21 08:58:19 INFO SecurityManager: Changing view acls groups to:
19/02/21 08:58:19 INFO SecurityManager: Changing modify acls groups to:
19/02/21 08:58:19 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop, osboxes); groups with view permissions: Set(); users with modify permissions: Set(hadoop, osboxes); groups with modify permissions: Set()
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:284)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.lookupTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:202)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
... 4 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
... 11 more
19/02/21 09:00:19 ERROR RpcOutboxMessage: Ask timeout before connecting successfully
是什么意思?主人和工人之间不换班吗? 非常感谢
答案 0 :(得分:2)
这意味着作业已提交给纱线。但是由于资源不足,它无法启动作业,因为纱线当前无法提供所需的资源。
转到Ambari / Cloudera UI,查看是否正在运行任何作业。 检查容器的纱线尺寸。 检查为该作业配置的资源是否超过纱线/中观可用资源总数