应用错误收集

我的问题可能与其他一些关于stackoverflow的问题类似，但是与众不同。

我有一个非常大的PysPark数据框（大约4000万行和30列）。我将文件导出为CSV文件。我尝试了不同的方法，但是所有方法都出错了。

到目前为止，我已经尝试过：

df.repartition(1).write.save(path='the path and name of the file.csv', format='csv', mode='overwrite', header='true')

和

df.toPandas().to_csv('path and the name of the file.csv', index=False)

运行大约1个小时，然后我都遇到了以下错误：

y4JJavaError  Traceback (most recent call last)
<ipython-input-117-040553681ce4> in <module>
.
.
.
Py4JJavaError: An error occurred while calling o666.save.
: org.apache.spark.SparkException: Job aborted.

请让我知道是否还有其他方法可以对大型数据帧执行此任务，而且速度也很快。

我正在使用python3.7.1，pyspark2.4和jupyter4.4.0

如何将很大的PySpark数据帧导出为CSV文件？

0 个答案: