我们正在尝试在具有13个节点的Raspberry Pi 3B +集群(1个主设备,12个工人)上对TPC-H(比例因子1)进行基准测试。每个节点都有1GB的RAM和一个四核处理器,运行Ubuntu Server 18.04。群集使用的是Spark独立调度程序,该程序带有TPCH dbgen工具中存储的HDFS中的* .tbl文件。尝试运行查询时,我们遇到了几次失败。作业意外失败,通常是在Web UI中显示一个或多个“ DEAD / LOST”节点。看起来一个或多个节点在查询执行期间“挂起”,并且变得不可访问/超时。我们在下面包括了我们的配置参数以及驱动程序。任何建议将不胜感激。
# spark-defaults.conf
spark.cores.max 36
spark.cleaner.periodicGC.interval 5min
spark.driver.extraJavaOptions -XX:+UseCompressedOops
spark.driver.memory 600m
spark.executor.cores 3
spark.executor.extraJavaOptions -XX:+UseCompressedOops
spark.executor.heartbeatInterval 60s
spark.executor.memory 600m
spark.master spark://rpnmas:7077
spark.network.timeout 300s
spark.submit.deployMode client
spark.reducer.maxSizeInFlight 8m
spark.sql.shuffle.partitions 400
spark.sql.sort.enableRadixSort false
spark.sql.inMemoryColumnarStorage.batchSize 10000
spark.rpc.message.maxSize 32
object TPCH {
...…<table schema>...
def main(args: Array[String]): Unit = {
val tabledir = args(0)
val querydir = args(1)
val numIters = args(2).toInt
val spark = SparkSession.builder.appName("TPCH").getOrCreate
//load tables from hdfs + cache
for (table <- tables) {
val path = tabledir + "/" + table.name + ".tbl"
val df = spark.read.schema(table.schema).option("sep", "|").csv(path)
df.createOrReplaceTempView(table.name)
spark.catalog.cacheTable(table.name)
}
//load queries from text file
val queries = (1 to 22).map { q =>
val path = querydir + s"/$q.sql"
val source = scala.io.Source.fromFile(path)
val query = try source.mkString finally source.close()
query
}
for ((query, i) <- queries.zipWithIndex) {
for (j <- 1 to numIters) {
val start = System.currentTimeMillis()
spark.sql(query).collect
val stop = System.currentTimeMillis()
val time = (stop-start)/1000.0
println(i + "," + j + "," + time)
}
}
}
}