我正在尝试使用以下代码创建PySpark数据框
#!/usr/bin/env python
# coding: utf-8
import pyspark
from pyspark.sql.session import SparkSession
import pyspark.sql.functions as f
from pyspark.sql.functions import coalesce
spark = SparkSession.builder.appName("Test").enableHiveSupport().getOrCreate()
#spark.sql("use bocconi")
tableName = "dynamic_pricing.final"
inputDF = spark.sql("""SELECT * FROM dynamic_pricing.final WHERE year = '2019' AND mercati_id = '6'""")
我收到以下错误:
Py4JJavaError: An error occurred while calling o48.sql.
: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 9730 tasks (1024.1 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
我已经遍历了以下链接:link1和link2,但是问题仍然没有解决。关于如何解决这个问题的任何想法? 我也尝试过这个:
# Create new config
conf = (SparkConf()
.set("spark.driver.maxResultSize", 0))
# Create new context
sc = SparkContext(conf=conf)
答案 0 :(得分:0)
Total size of serialized results of 9730 tasks is bigger than spark.driver.maxResultSize
意味着您一次发送的太多邮件供驱动程序接收。查看您的1024.0 MB(仅1GB)的maxResultSize,建议您增加maxResultSize。尝试将其设置为0,以使其不受限制,然后检查是否存在内存不足错误。