Question

当尝试创建sklearn2pmml管道时，我使用以下代码进行自定义映射，然后使用PMMLLabelBinarizer创建虚拟变量。事情是，我想避免虚拟变量陷阱。有没有一种方法可以使用PMMLPipelines做到这一点，并避免使用任何自定义FunctionTransformer函数（我最终希望将管道转换为PMML文件）

我找不到使用现成的PMML兼容功能删除最后一列的方法。（DataframeMapper是sklearn_pandas函数）。

# here i will get the i-th character value in unique states
# for example in the first iteration i = 'AL' if thats at the top of unique states
library(readr)
for( i in unique_states){

  write_csv(filter(states , State = i ), path = paste0(i, '.csv'))
}

Answer 1

您可以使用import { Component, OnInit } from '@angular/core'; import { SelectText } from './select-text.model'; @Component({ selector: 'my-app', templateUrl: './app.component.html', styleUrls: ['./app.component.css'] }) export class AppComponent implements OnInit { name = 'Angular'; selectedProp: string; selectedText: any = "yellow"; selectedText2: SelectText; arrayList: Array<any> = []; hitMe() { this.selectedProp = "2"; } ngOnInit() { // this.selectedText = new SelectText(); this.arrayList.push({ value: 1, text: "First Value" }); this.arrayList.push({ value: 2, text: "Second Value" }); this.arrayList.push({ value: 3, text: "Third Value" }); this.arrayList.push({ value: 4, text: "Fourth Value" }); this.arrayList.push({ value: 5, text: "Fifth Value" }); } }来限制列数；这个想法是指定sklearn.compose.ColumnTransformer。

例如，如果您的管道以产生5列矩阵的DataFrameMapper开头，但您只想保留前四列：

ColumnTransformer.remainder = "drop"

从最新的SkLearn2PMML版本0.42.0开始，就支持pipeline = PMMLPipeline([ ("mapper", DataFrameMapper[...]), ("slicer", ColumnTransformer([ ("keep", "passthrough", [0, 1, 2, 3]) ], remainder = "drop"), ("estimator", ...) ])，因此您可能需要先升级到它。

如何避免伪变量陷阱进行sklearn2pmml转换

1 个答案: