IndexToString转换与StringIndexer中的标签

时间:2018-01-04 10:32:49

标签: python apache-spark machine-learning pyspark

如何通过从IndexToString

获取标签,使用labelIndexer转换
labelIndexer = StringIndexer(inputCol="shutdown_reason", outputCol="label")

idx_to_string = IndexToString(inputCol="prediction", outputCol="predictedValue")

1 个答案:

答案 0 :(得分:2)

  

如何通过使用labelIndexer?

中的标签来使用IndexToString进行转换

你做不到。 labelIndexerStringIndexer,要获得标签,您需要StringIndexerModelfit模型:

from pyspark.ml.feature import *

df = spark.createDataFrame([
    ("foo", ), ("bar", )
]).toDF("shutdown_reason")

labelIndexerModel = labelIndexer.fit(df)

使用标签:

idx_to_string.setLabels(labelIndexerModel.labels)
idx_to_string.getLabels()
# ['foo', 'bar']

transform

df_with_prediction = labelIndexerModel.transform(df).withColumnRenamed(
    "label", "prediction"
)

idx_to_string.transform(df_with_prediction).show()
# +---------------+----------+--------------+
# |shutdown_reason|prediction|predictedValue|
# +---------------+----------+--------------+
# |            foo|       0.0|           foo|
# |            bar|       1.0|           bar|
# +---------------+----------+--------------+