我正在构建一个ML管道,它从DataFrame中提取功能,我希望它的行为如下:
事情是,转变是懒惰的,我最终得到以下结论:
我的转换方法看起来有点像:
override def transform(dataset: DataFrame): DataFrame = {
require(featuresToExtract.size > 0, "You must provide at least one feature to extract to use this FeatureExtractorTransformer")
var joinedDataFrame = extract(dataset, featuresToExtract head)
for (featureToExtract <- featuresToExtract.tail) {
// LOGGING HERE THAT I WANT CALLED JUST BEFORE THE CORRESPONDING ACTION
joinedDataFrame = joinedDataFrame.join(extract(dataset, featureToExtract), joinOn, "outer")
}
joinedDataFrame
}
所以关于如何进行的任何想法?
由于