Spark Thrift服务器在通过JDBC传输之前将完整的数据集加载到内存中

时间:2018-11-01 08:37:48

标签: apache-spark spark-thriftserver

Spark Thrift服务器试图在通过JDBC传输之前将完整的数据集加载到内存中,在JDBC客户端上,我收到错误:

SQL Error: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
  org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
  org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)

查询:从表中选择*。是否可以为Thrift Server启用类似流模式的功能?主要目标-通过JDBC连接使用SparkSQL授予从Pentaho ETL到Hadoop集群的访问权限。但是,如果Thrift Server应该在传输之前将完整的数据集加载到内存中,则这种方法将行不通。

2 个答案:

答案 0 :(得分:0)

解决方案:spark.sql.thriftServer.incrementalCollect = true

答案 1 :(得分:0)

我的情况是增加spark驱动程序的内存和最大结果大小,如spark.driver.memory = xG,spark.driver.maxResultSize = xG。根据{{​​3}}