如何使用类型提示优化PySpark toPandas()

时间:2020-10-15 21:47:34

标签: pyspark

以前我在PySpark中没有看到此警告:

The conversion of DecimalType columns is inefficient and may take a long time. Column names: [PVPERUSER] If those columns are not necessary, you may consider dropping them or converting to primitive types before the conversion.

处理它的最佳方法是什么?这是传递给toPandas()的参数,还是我需要以特定方式键入数据框?

我的代码是对熊猫的简单pyspark对话:

df = data.toPandas()

1 个答案:

答案 0 :(得分:1)

试试这个:

df = data.select(data.PVPERUSER.cast('float'), data.another_column).toPandas()