应用错误收集

Raspberry Pi群集上的SparkSQL

时间：2019-07-11 14:21:02

标签： apache-spark apache-spark-sql

我们正在尝试在具有13个节点的Raspberry Pi 3B +集群（1个主设备，12个工人）上对TPC-H（比例因子1）进行基准测试。每个节点都有1GB的RAM和一个四核处理器，运行Ubuntu Server 18.04。群集使用的是Spark独立调度程序，该程序带有TPCH dbgen工具中存储的HDFS中的* .tbl文件。尝试运行查询时，我们遇到了几次失败。作业意外失败，通常是在Web UI中显示一个或多个“ DEAD / LOST”节点。看起来一个或多个节点在查询执行期间“挂起”，并且变得不可访问/超时。我们在下面包括了我们的配置参数以及驱动程序。任何建议将不胜感激。

配置文件：

# spark-defaults.conf
spark.cores.max                    36
spark.cleaner.periodicGC.interval  5min
spark.driver.extraJavaOptions      -XX:+UseCompressedOops
spark.driver.memory                600m
spark.executor.cores               3
spark.executor.extraJavaOptions    -XX:+UseCompressedOops
spark.executor.heartbeatInterval   60s
spark.executor.memory              600m
spark.master                       spark://rpnmas:7077
spark.network.timeout              300s
spark.submit.deployMode            client
spark.reducer.maxSizeInFlight                8m
spark.sql.shuffle.partitions                 400
spark.sql.sort.enableRadixSort               false
spark.sql.inMemoryColumnarStorage.batchSize  10000
spark.rpc.message.maxSize                    32

驱动程序：

object TPCH {
  ...…<table schema>...

  def main(args: Array[String]): Unit = {
    val tabledir = args(0)
    val querydir = args(1)
    val numIters = args(2).toInt
    val spark = SparkSession.builder.appName("TPCH").getOrCreate

    //load tables from hdfs + cache
    for (table <- tables) {
      val path = tabledir + "/" + table.name + ".tbl"
      val df = spark.read.schema(table.schema).option("sep", "|").csv(path)
      df.createOrReplaceTempView(table.name)
      spark.catalog.cacheTable(table.name)
      }

    //load queries from text file
    val queries = (1 to 22).map { q =>
      val path = querydir + s"/$q.sql"
      val source = scala.io.Source.fromFile(path)
      val query = try source.mkString finally source.close()
      query
    }

    for ((query, i) <- queries.zipWithIndex) {
      for (j <- 1 to numIters) {
        val start = System.currentTimeMillis()
        spark.sql(query).collect
        val stop = System.currentTimeMillis()
        val time = (stop-start)/1000.0
        println(i + "," + j + "," + time)
      }
    }
  }
}

0 个答案:

没有答案