Spark Thrift服务器试图在通过JDBC传输之前将完整的数据集加载到内存中,在JDBC客户端上,我收到错误:
SQL Error: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
查询:从表中选择*。是否可以为Thrift Server启用类似流模式的功能?主要目标-通过JDBC连接使用SparkSQL授予从Pentaho ETL到Hadoop集群的访问权限。但是,如果Thrift Server应该在传输之前将完整的数据集加载到内存中,则这种方法将行不通。
答案 0 :(得分:0)
解决方案:spark.sql.thriftServer.incrementalCollect = true
答案 1 :(得分:0)
我的情况是增加spark驱动程序的内存和最大结果大小,如spark.driver.memory = xG,spark.driver.maxResultSize = xG。根据{{3}}