我正在尝试使用spark执行一个简单的应用程序示例代码。使用spark submit执行作业。 spark-submit --class" SimpleJob" --master spark://:7077 target / scala-2.10 / simple-project_2.10-1.0.jar
15/03/08 23:21:53 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/03/08 23:21:53 WARN LoadSnappy: Snappy native library not loaded
15/03/08 23:22:09 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
Lines with a: 21, Lines with b: 21
作业会给出正确的结果,但会在其下方出现以下错误:
15/03/08 23:22:28 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(<worker-host.domain.com>,53628)
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
at org.apache.spark.network.SendingConnection.read(Connection.scala:390)
at org.apache.spark.network.ConnectionManager$$anon$6.run(ConnectionManager.scala:205)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/03/08 23:22:28 ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(<worker-host.domain.com>,53628) not found
15/03/08 23:22:28 WARN ConnectionManager: All connections not cleaned up
以下是spark-defaults.conf
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory 5g
spark.master spark://<master-ip>:7077
spark.eventLog.enabled true
spark.executor.extraClassPath $SPARK-HOME/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.2.0-SNAPSHOT.jar
spark.cassandra.connection.conf.factory com.datastax.spark.connector.cql.DefaultConnectionFactory
spark.cassandra.auth.conf.factory com.datastax.spark.connector.cql.DefaultAuthConfFactory
spark.cassandra.query.retry.count 10
以下是spark-env.sh
SPARK_LOCAL_IP=<master-ip in master worker-ip in workers>
SPARK_MASTER_HOST='<master-hostname>'
SPARK_MASTER_IP=<master-ip>
SPARK_MASTER_PORT=7077
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_INSTANCES=4
答案 0 :(得分:0)
得到了答案,
即使我通过命令将cassandra连接器添加到类路径,我也不会向集群的所有节点发送相同的路径。
现在我使用下面的命令序列来正确地执行它
spark-shell --driver-class-path ~/Installers/spark-cassandra-connector-1.1.1/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.1.1.jar
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import com.datastax.spark.connector._
sc.addJar("~/Installers/spark-cassandra-connector-1.1.1/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.1.1.jar")
在这些命令之后,我能够运行所有的读取和放大使用spark RDD正确写入我的cassandra集群。