启动时Cassandra表停顿的Apache Spark作业读取(spark-1.3.1)

时间:2015-05-27 21:37:24

标签: cassandra apache-spark

我们一直遇到Spark 1.3.1和数据存储Cassandra连接器的间歇性问题,导致作业在启动时无限期停止。

编辑:我也尝试过与Spark 1.2.1和打包的1.2.1 spark-cassandra-connector_2.10相同的方法,它会导致相同的症状。

我们使用以下依赖项:

var sparkCas = "com.datastax.spark" % "spark-cassandra-connector_2.10" % "1.3.0-SNAPSHOT"

我们的工作代码:

object ConnTransform {

  private val AppName = "ConnTransformCassandra"

  def main(args: Array[String]) {

    val start = new DateTime(2015, 5, 27, 1, 0, 0)

    val master = if (args.length >= 1) args(0) else "local[*]"

    // Create the spark context.
    val sc = {
      val conf = new SparkConf()
        .setAppName(AppName)
        .setMaster(master)
        .set("spark.cassandra.connection.host", "10.10.101.202,10.10.102.139,10.10.103.74")

      new SparkContext(conf)
    }

    sc.cassandraTable("alpha_dev", "conn")
      .select("data")
      .where("timep = ?", start)
      .where("sensorid IN ?", Utils.sensors)
      .map(Utils.deserializeRow)
      .saveAsTextFile("output/raw_data")
  }
}

正如您所看到的,代码非常简单(而且它更复杂,但我们一直试图缩小此问题的根本原因。)

现在,这项工作今天早些时候工作 - 数据已成功放入指定的目录。然而,现在当它运行时,我们看到作业开始,在它开始处理块之前到达点,并无限期地坐在那里。

以下作业的输出显示了到目前为止看到的日志消息,并且在撰写本文时,作业已停顿了将近一个小时。如果我们将日志记录级别设置为DEBUG,那么在工作中该点之后您看到的唯一事情就是akka worker之间的心跳ping。

ubuntu@ip-10-10-102-53:~/projects/icespark$ /home/ubuntu/spark/spark-1.3.1/bin/spark-submit --class com.splee.spark.ConnTransform splee-analytics-assembly-0.1.0.jar
15/05/27 21:15:21 INFO SparkContext: Running Spark version 1.3.1
15/05/27 21:15:21 INFO SecurityManager: Changing view acls to: ubuntu
15/05/27 21:15:21 INFO SecurityManager: Changing modify acls to: ubuntu
15/05/27 21:15:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); users with modify permissions: Set(ubuntu)
15/05/27 21:15:22 INFO Slf4jLogger: Slf4jLogger started
15/05/27 21:15:22 INFO Remoting: Starting remoting
15/05/27 21:15:22 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@ip-10-10-102-53.us-west-2.compute.internal:51977]
15/05/27 21:15:22 INFO Utils: Successfully started service 'sparkDriver' on port 51977.
15/05/27 21:15:22 INFO SparkEnv: Registering MapOutputTracker
15/05/27 21:15:22 INFO SparkEnv: Registering BlockManagerMaster
15/05/27 21:15:22 INFO DiskBlockManager: Created local directory at /tmp/spark-2466ff66-bb50-4d52-9d34-1801d69889b9/blockmgr-60e75214-1ba6-410c-a564-361263636e5c
15/05/27 21:15:22 INFO MemoryStore: MemoryStore started with capacity 265.1 MB
15/05/27 21:15:22 INFO HttpFileServer: HTTP File server directory is /tmp/spark-72f1e849-c298-49ee-936c-e94c462f3df2/httpd-f81c2326-e5f1-4f33-9557-074f2789c4ee
15/05/27 21:15:22 INFO HttpServer: Starting HTTP Server
15/05/27 21:15:22 INFO Server: jetty-8.y.z-SNAPSHOT
15/05/27 21:15:22 INFO AbstractConnector: Started SocketConnector@0.0.0.0:55357
15/05/27 21:15:22 INFO Utils: Successfully started service 'HTTP file server' on port 55357.
15/05/27 21:15:22 INFO SparkEnv: Registering OutputCommitCoordinator
15/05/27 21:15:22 INFO Server: jetty-8.y.z-SNAPSHOT
15/05/27 21:15:22 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/05/27 21:15:22 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/05/27 21:15:22 INFO SparkUI: Started SparkUI at http://ip-10-10-102-53.us-west-2.compute.internal:4040
15/05/27 21:15:22 INFO SparkContext: Added JAR file:/home/ubuntu/projects/icespark/splee-analytics-assembly-0.1.0.jar at http://10.10.102.53:55357/jars/splee-analytics-assembly-0.1.0.jar with timestamp 1432761322942
15/05/27 21:15:23 INFO Executor: Starting executor ID <driver> on host localhost
15/05/27 21:15:23 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@ip-10-10-102-53.us-west-2.compute.internal:51977/user/HeartbeatReceiver
15/05/27 21:15:23 INFO NettyBlockTransferService: Server created on 58479
15/05/27 21:15:23 INFO BlockManagerMaster: Trying to register BlockManager
15/05/27 21:15:23 INFO BlockManagerMasterActor: Registering block manager localhost:58479 with 265.1 MB RAM, BlockManagerId(<driver>, localhost, 58479)
15/05/27 21:15:23 INFO BlockManagerMaster: Registered BlockManager
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.101.28:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.101.28 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.103.60:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.103.60 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.102.154:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.102.154 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.101.145:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.101.145 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.103.78:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.103.78 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.102.200:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.102.200 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.102.73:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.102.73 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.103.205:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.103.205 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.101.205:9042 added
15/05/27 21:15:24 INFO LocalNodeFirstLoadBalancingPolicy: Added host 10.10.101.205 (us-west-2)
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.103.74:9042 added
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.101.202:9042 added
15/05/27 21:15:24 INFO Cluster: New Cassandra host /10.10.102.139:9042 added
15/05/27 21:15:24 INFO CassandraConnector: Connected to Cassandra cluster: Splee Dev
15/05/27 21:15:25 INFO CassandraConnector: Disconnected from Cassandra cluster: Splee Dev

如果有人有任何想法可能导致这项工作(以前产生的结果)以这种方式停滞不前,并且可以对这种情况有所了解,那将非常感激。

0 个答案:

没有答案