使用toPandas()方法创建的数据帧是否分布在spark集群中?

时间:2015-08-05 16:57:27

标签: pandas apache-spark pyspark pyspark-sql

我正在通过

阅读CSV
data=sc.textFile("filename") 

Df = Sparksql.create dataframe()

Pdf = Df.toPandas ()

现在是Pdf分布在整个火花集群中还是它驻留在主机环境中?

1 个答案:

答案 0 :(得分:0)

否。

正如在PySpark source code of DataFrame中所说:

    .. note:: This method should only be used if the resulting Pandas's DataFrame is expected
        to be small, as all the data is loaded into the driver's memory.