Question

我正在通过

阅读CSV

data=sc.textFile("filename") 

Df = Sparksql.create dataframe()

Pdf = Df.toPandas ()

现在是Pdf分布在整个火花集群中还是它驻留在主机环境中？

Answer 1

否。

正如在PySpark source code of DataFrame中所说：

    .. note:: This method should only be used if the resulting Pandas's DataFrame is expected
        to be small, as all the data is loaded into the driver's memory.

使用toPandas（）方法创建的数据帧是否分布在spark集群中？

1 个答案: