应用错误收集

我们使用Spark2 Thrift来运行Hive查询。

Thrift是HDP 2.6的一部分，我们的Spark版本是2.1.0.2.6.0.3-8。

我们同时运行的查询越多，我们在驱动程序中遇到OOM的速度就越快。这些查询还包含JOIN和UNION。

来自jstat的

似乎没有内存泄漏，但无论给驱动程序多少内存，它似乎永远不够。同时运行的查询越多，Thrift驱动程序在崩溃之前就开始执行完整的GC，因为完整的GC无法清理旧内存（因为它已被使用）。

OOM永远不会出现在执行程序中，只会出现在驱动程序中。

有没有人与Thrift合作过火花并遇到这个问题？如果是这样的话 - 在同时运行多个查询时，Thrift驱动程序如何配置为不会在OOM上崩溃？

这些是我们使用的配置：

节俭火花司机：

节俭火花执行者：

来自/usr/hdp/current/spark2-thriftserver/conf/spark-thrift-sparkconf.conf的

config params

spark.broadcast.blockSize 32m
spark.driver.extraLibraryPath / usr / hdp / current / hadoop-client / lib / native：/ usr / hdp / current / hadoop-client / lib / native / Linux-amd64-64
spark.driver.maxResultSize 0
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.executorIdleTimeout 45s
spark.dynamicAllocation.initialExecutors 2
spark.dynamicAllocation.maxExecutors 15
spark.dynamicAllocation.minExecutors 0
spark.dynamicAllocation.schedulerBacklogTimeout 1s
spark.eventLog.dir hdfs：/// spark2-history /
spark.eventLog.enabled true
spark.executor.extraLibraryPath / usr / hdp / current / hadoop-client / lib / native：/ usr / hdp / current / hadoop-client / lib / native / Linux-amd64-64
spark.executor.memory 10g
spark.files.maxPartitionBytes 268435456
spark.files.openCostInBytes 33554432
spark.hadoop.cacheConf false
spark.history.fs.logDirectory hdfs：/// spark2-history /
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.kryoserializer.buffer.max 2000m
spark.master yarn-client
spark.memory.offHeap.enabled true
spark.memory.offHeap.size 104857600
spark.scheduler.allocation.file /usr/hdp/current/spark2-thriftserver/conf/spark-thrift-fairscheduler.xml
spark.scheduler.mode FAIR
spark.shuffle.service.enabled true
spark.sql.autoBroadcastJoinThreshold 1073741824
spark.sql.shuffle.partitions 100
spark.storage.memoryMapThreshold 8m