Question

以前我在PySpark中没有看到此警告：

The conversion of DecimalType columns is inefficient and may take a long time. Column names: [PVPERUSER] If those columns are not necessary, you may consider dropping them or converting to primitive types before the conversion.

处理它的最佳方法是什么？这是传递给toPandas（）的参数，还是我需要以特定方式键入数据框？

我的代码是对熊猫的简单pyspark对话：

df = data.toPandas()

Answer 1

试试这个：

df = data.select(data.PVPERUSER.cast('float'), data.another_column).toPandas()

如何使用类型提示优化PySpark toPandas（）

1 个答案: