PYSPARK:对所有列应用分位数离散器

时间:2019-03-06 11:03:10

标签: pyspark apache-spark-mllib

假设我有1000列。如何对所有列应用QuantileDiscretizer?

离散化一列就像下面这样:

result_discretizer1 = QuantileDiscretizer(numBuckets=2, 
inputCol="I1",outputCol="result1").fit(df).transform(df)
result_discretizer1.show()

+---+----+---+---+---+-------+
| id|  I1| I2| I3| I4|result1|
+---+----+---+---+---+-------+
|1.0|1.23|2.5|3.9|5.0|    1.0|
|2.0|1.23|2.5|3.9|6.0|    1.0|
|3.0|1.23|5.8|9.0|6.0|    1.0|
|4.0|1.23|2.5|3.9|6.0|    1.0|
+---+----+---+---+---+-------+

如果我想为所有列应用QuantileDiscretizer并获取所有离散化列作为输出怎么办?

0 个答案:

没有答案