以前我在PySpark中没有看到此警告:
The conversion of DecimalType columns is inefficient and may take a long time. Column names: [PVPERUSER] If those columns are not necessary, you may consider dropping them or converting to primitive types before the conversion.
处理它的最佳方法是什么?这是传递给toPandas()的参数,还是我需要以特定方式键入数据框?
我的代码是对熊猫的简单pyspark对话:
df = data.toPandas()
答案 0 :(得分:1)
试试这个:
df = data.select(data.PVPERUSER.cast('float'), data.another_column).toPandas()