我遇到了违背我理解的事情。我的理解是这个'这个'但是,对于活动对象,不能为null,对于下面显示的情况,我遇到了类似的事情。
上下文 - 我在这种情况下使用XGBoost4J-Spark包。您可以查看源代码here。更具体地说,我指的是XGBoostEstimator类。我有以下类的定义,只有一个额外的print语句。
UPDATE TABLE1
SET COLUMN1 = (SELECT column1 FROM BASKET where column1='XX' LIMIT 1),
COLUMN2 = (SELECT column2 FROM BASKET where column2='YY' LIMIT 1)
WHERE column1='APPLE'
当我通过Sprak-Shell(或通过测试)初始化相同的代码时,以下是我得到的输出:
package ml.dmlc.xgboost4j.scala.spark
import ml.dmlc.xgboost4j.scala.{EvalTrait, ObjectiveTrait}
import org.apache.spark.ml.{Predictor, Estimator}
import org.apache.spark.ml.param.ParamMap
import org.apache.spark.ml.util.Identifiable
import org.apache.spark.mllib.linalg.{VectorUDT, Vector}
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.{NumericType, DoubleType, StructType}
import org.apache.spark.sql.{DataFrame, TypedColumn, Dataset, Row}
/**
* the estimator wrapping XGBoost to produce a training model
*
* @param inputCol the name of input column
* @param labelCol the name of label column
* @param xgboostParams the parameters configuring XGBoost
* @param round the number of iterations to train
* @param nWorkers the total number of workers of xgboost
* @param obj the customized objective function, default to be null and using the default in model
* @param eval the customized eval function, default to be null and using the default in model
* @param useExternalMemory whether to use external memory when training
* @param missing the value taken as missing
*/
class XGBoostEstimator(
inputCol: String, labelCol: String,
xgboostParams: Map[String, Any], round: Int, nWorkers: Int,
obj: Option[ObjectiveTrait] = None,
eval: Option[EvalTrait] = None, useExternalMemory: Boolean = false, missing: Float = Float.NaN)
extends Estimator[XGBoostModel] {
println(s"This is ${this}")
override val uid: String = Identifiable.randomUID("XGBoostEstimator")
/**
* produce a XGBoostModel by fitting the given dataset
*/
def fit(trainingSet: Dataset[_]): XGBoostModel = {
val instances = trainingSet.select(
col(inputCol), col(labelCol).cast(DoubleType)).rdd.map {
case Row(feature: Vector, label: Double) =>
LabeledPoint(label, feature)
}
transformSchema(trainingSet.schema, logging = true)
val trainedModel = XGBoost.trainWithRDD(instances, xgboostParams, round, nWorkers, obj.get,
eval.get, useExternalMemory, missing).setParent(this)
copyValues(trainedModel)
}
override def copy(extra: ParamMap): Estimator[XGBoostModel] = {
defaultCopy(extra)
}
override def transformSchema(schema: StructType): StructType = {
// check input type, for now we only support vectorUDT as the input feature type
val inputType = schema(inputCol).dataType
require(inputType.equals(new VectorUDT), s"the type of input column $inputCol has to VectorUDT")
// check label Type,
val labelType = schema(labelCol).dataType
require(labelType.isInstanceOf[NumericType], s"the type of label column $labelCol has to" +
s" be NumericType")
schema
}
}
有关此行为可行的原因和时间的任何说明都会有所帮助。
答案 0 :(得分:6)
您的toString()
实施来自Identifiable
,它只返回uid集。并且由于您在下一行中设置了uid,因此在打印时未初始化。
可识别的source:
trait Identifiable {
/**
* An immutable unique ID for the object and its derivatives.
*/
val uid: String
override def toString: String = uid
}