如何在Spark Dataframe LogisticRegression输出中读取概率向量

时间:2016-06-07 00:41:46

标签: scala apache-spark apache-spark-mllib spark-dataframe

我正在尝试读取逻辑回归输出的第一个概率,以便我可以对其执行十进制分级。

下面是一些只用向量模拟输出的测试代码。

    val r = sqlContext.createDataFrame(Seq(("jane", Vectors.dense(.98)),
        ("tom", Vectors.dense(.34)),
        ("nancy", Vectors.dense(.93)),
        ("tim", Vectors.dense(.02)),
        ("larry", Vectors.dense(.033)),
        ("lana", Vectors.dense(.85)),
        ("jack", Vectors.dense(.84)),
        ("john", Vectors.dense(.09)),
        ("jill", Vectors.dense(.12)),
        ("mike", Vectors.dense(.21)),
        ("jason", Vectors.dense(.31)),
        ("roger", Vectors.dense(.76)),
        ("ed", Vectors.dense(.77)),
        ("alan", Vectors.dense(.64)),
        ("ryan", Vectors.dense(.52)),
        ("ted", Vectors.dense(.66)),
        ("paul", Vectors.dense(.67)),
        ("brian", Vectors.dense(.68)),
        ("jeff", Vectors.dense(.05)))).toDF(CSMasterCustomerID, MLProbability)
    var result = r.select(CSMasterCustomerID, MLProbability)
    val schema = StructType(Seq(StructField(CSMasterCustomerID, StringType, false), StructField(MLProbability, DoubleType, true)))
    result = sqlContext.createDataFrame(result.map((r: Row) => {
        r match {
            case Row(mcid: String, probability: Vector) =>
                RowFactory.create(mcid, probability(0))
        }
    }), schema)

这无法编译说:

<console>:56: error: type mismatch;
found   : Double
required: Object
Note: an implicit exists from scala.Double => java.lang.Double, but
methods inherited from Object are rendered ambiguous.  This is to avoid
a blanket implicit which would convert any scala.Double to any AnyRef.
You may wish to use a type ascription: `x: java.lang.Double`.
                       RowFactory.create(mcid, probability(0))

有任何建议可以解决这个问题或其他方法吗?

0 个答案:

没有答案