应用错误收集

pyspark toPandas错误？

时间：2017-06-06 11:11:18

标签： python python-2.7 pyspark spark-dataframe

我有一个混乱且非常大的数据集，包括中文字符，数字，字符串，date.etc。在我使用pyspark进行一些清洁并希望将其变成熊猫之后，它会引发这个错误：
IOPub data rate exceeded. The notebook server will temporarily stop sending output to the client in order to avoid crashing it. To change this limit, set the config variable --NotebookApp.iopub_data_rate_limit. 17/06/06 18:48:54 WARN TaskSetManager: Lost task 8.0 in stage 13.0 (TID 393, localhost): TaskKilled (killed intentionally)

在错误之上，它输出了我的一些原始数据。它很长。所以我只发布部分内容。

我检查了清理后的数据。所有列类型均为int，double。为什么它仍然输出我的旧数据？

2 个答案:

答案 0 :(得分：0)

尝试启动jupyter笔记本增加'iopub_data_rate_limit'为：

jupyter notebook --NotebookApp.iopub_data_rate_limit = 10000000000

来源：https://github.com/jupyter/notebook/issues/2287

答案 1 :(得分：0)

最好的方法是将它放在你的jupyterhub_config.py文件中：

c.Spawner.args = ['--NotebookApp.iopub_data_rate_limit=1000000000']