Pyspark ML - RandomForestClassificationModel的预测值

时间:2018-05-02 11:14:44

标签: python apache-spark pyspark

我尝试了random_forest_classifier_example.py,但它确实有效。对于下一步,我尝试使用标记为“1”的第3行sample_libsvm_data.txt进行预测。我添加了:

indexes = [
    124, 125, 126, 127, 151, 152, 153, 154, 155, 179,
    180, 181, 182, 183, 208, 209, 210, 211, 235, 236,
    237, 238, 239, 263, 264, 265, 266, 267, 268, 292,
    293, 294, 295, 296, 321, 322, 323, 324, 349, 350,
    351, 352, 377, 378, 379, 380, 405, 406, 407, 408,
    433, 434, 435, 436, 461, 462, 463, 464, 489, 490,
    491, 492, 493, 517, 518, 519, 520, 521, 545, 546,
    547, 548, 549, 574, 575, 576, 577, 578, 602, 603,
    604, 605, 606, 630, 631, 632, 633, 634, 658, 659,
    660, 661, 662
]
values = [
    145.0, 255.0, 211.0, 31.0, 32.0, 237.0, 253.0, 252.0, 71.0, 11.0,
    175.0, 253.0, 252.0, 71.0, 144.0, 253.0, 252.0, 71.0, 16.0, 191.0,
    253.0, 252.0, 71.0, 26.0, 221.0, 253.0, 252.0, 124.0, 31.0, 125.0,
    253.0, 252.0, 252.0, 108.0, 253.0, 252.0, 252.0, 108.0, 255.0, 253.0,
    253.0, 108.0, 253.0, 252.0, 252.0, 108.0, 253.0, 252.0, 252.0, 108.0,
    253.0, 252.0, 252.0, 108.0, 255.0, 253.0, 253.0, 170.0, 253.0, 252.0,
    252.0, 252.0, 42.0, 149.0, 252.0, 252.0, 252.0, 144.0, 109.0, 252.0,
    252.0, 252.0, 144.0, 218.0, 253.0, 253.0, 255.0, 35.0, 175.0, 252.0,
    252.0, 253.0, 35.0, 73.0, 252.0, 252.0, 253.0, 35.0, 31.0, 211.0,
    252.0, 253.0, 35.0
]
testDf = spark.createDataFrame([(Vectors.sparse(692, indexes, values),)], ["indexedFeatures"])
result = rfModel.transform(testDf).head()
print(result.prediction)

之前

spark.stop()
random_forest_classifier_example.py

,然后运行代码。我期望获得“result.prediction = 1.0”,但获得“result.prediction = 0.0”。我误解了什么吗?有没有人有想法?

我正在使用pyspark 2.3,添加:

from pyspark.ml.linalg import Vectors

我引用了classification.py的“class RandomForestClassifier”部分。谢谢。

0 个答案:

没有答案