我是新的火花贡献者。我想为随机森林分类器添加类权重支持,如下所述:https://issues.apache.org/jira/browse/SPARK-9478
我已经完成了函数实现,我在这里遵循代码提供说明:https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-PreparingtoContributeCodeChanges
在说明中,它说“使用./dev/run-tests运行所有测试以验证代码是否仍然编译,通过测试并通过样式检查”。当我运行测试时,我的代码无法通过二进制兼容性检查。
日志说:
[error] * method this(scala.Enumeration#Value,org.apache.spark.mllib.tree.impurity.Impurity,Int,Int,Int,scala.Enumeration#Value,scala.collection.immutable.Map,Int,Double,Int,Double,Boolean,Int)Unit
in class org.apache.spark.mllib.tree.configuration.Strategy does not have a correspondent in current version
[error] filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.tree.configuration.Strategy.this")
我更改了文件“org.apache.spark.mllib.tree.configuration.Strategy”,因为我需要更改此类的接口。我做的是添加一个新的输入参数,如下所示:
class Strategy @Since("1.3.0") (
@Since("1.0.0") @BeanProperty var algo: Algo,
@Since("1.0.0") @BeanProperty var impurity: Impurity,
@Since("1.0.0") @BeanProperty var maxDepth: Int,
@Since("1.2.0") @BeanProperty var numClasses: Int = 2,
@Since("1.0.0") @BeanProperty var maxBins: Int = 32,
@Since("1.0.0") @BeanProperty var quantileCalculationStrategy: QuantileStrategy = Sort,
@Since("1.0.0") @BeanProperty var categoricalFeaturesInfo: Map[Int, Int] = Map[Int, Int](),
@Since("1.2.0") @BeanProperty var minInstancesPerNode: Int = 1,
@Since("1.2.0") @BeanProperty var minInfoGain: Double = 0.0,
@Since("1.0.0") @BeanProperty var maxMemoryInMB: Int = 256,
@Since("1.2.0") @BeanProperty var subsamplingRate: Double = 1,
@Since("1.2.0") @BeanProperty var useNodeIdCache: Boolean = false,
- @Since("1.2.0") @BeanProperty var checkpointInterval: Int = 10) extends Serializable {
+ @Since("1.2.0") @BeanProperty var checkpointInterval: Int = 10,
+ @Since("2.0.0") @BeanProperty var classWeights: Array[Double] = Array(1, 1))
如何解决此问题或调试方向是什么?
--------------------------------更新------------- -----------------
我不是在JIRA中对此问题提出拉取请求的作者之一。我有一个新的实现,需要更少的内存来实现相同的目标。我的代码可以在这里找到:https://github.com/n-triple-a/spark,分支'weightedRandomForest'有上面提到的问题。
我现在可以通过在策略类中添加一个构造函数来解决这个问题,该类具有前13个参数(或参数列表中没有 classWeights ),如下所示: / p>
this(var algo: Algo,
impurity: Impurity,
maxDepth: Int,
numClasses: Int,
maxBins: Int,
quantileCalculationStrategy: QuantileStrategy,
categoricalFeaturesInfo: Map[Int, Int],
minInstancesPerNode: Int,
minInfoGain: Double,
maxMemoryInMB: Int,
subsamplingRate: Double,
useNodeIdCache: Boolean,
checkpointInterval: Int) {
this(algo, impurity, maxDepth, numClasses, maxBins,
quantileCalculationStrategy, categoricalFeaturesInfo, minInstancesPerNode,
minInfoGain, maxMemoryInMB, subsamplingRate, useNodeIdCache,
checkpointInterval, Array(1.0, 1.0))
}
我还改变了scalastyle对函数允许的最大参数数量的定义,默认情况下为10。但这对我来说很奇怪,因为有一个默认值绑定到 classWeights 。为什么我必须添加冗余构造函数?
答案 0 :(得分:0)
与您的JIRA相关联的PR https://github.com/apache/spark/pull/9008/files似乎不包含Strategy
类。请更新包含该文件的PR,然后告知我们是否还有其他问题。