我正在使用sklearn中的Pipeline
对文字进行分类。
在此示例Pipeline
中,我有一个TfidfVectorizer
和一些自定义功能,其中包含FeatureUnion
和一个分类器作为Pipeline
步骤,然后我会根据培训数据和做预测:
from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
X = ['I am a sentence', 'an example']
Y = [1, 2]
X_dev = ['another sentence']
# classifier
LinearSVC1 = LinearSVC(tol=1e-4, C = 0.10000000000000001)
pipeline = Pipeline([
('features', FeatureUnion([
('tfidf', TfidfVectorizer(ngram_range=(1, 3), max_features= 4000)),
('custom_features', CustomFeatures())])),
('clf', LinearSVC1),
])
pipeline.fit(X, Y)
y_pred = pipeline.predict(X_dev)
# etc.
在这里,我需要挑选TfidfVectorizer
步骤并保持custom_features
未打开,因为我仍然使用它们进行实验。我们的想法是通过挑选tfidf步骤来加快管道。
我知道我可以用Pipeline
来腌制整个joblib.dump
,但我如何挑选个别步骤呢?
答案 0 :(得分:2)
要挑选TfidfVectorizer,您可以使用:
import {Component, Directive, HostListener, EventEmitter, ElementRef, OnInit} from 'angular2/core';
import {map, merge} from 'rxjs/Rx';
@Directive({
selector: '[draggable]'
})
export class Draggable implements OnInit {
mouseup = new EventEmitter();
mousedown = new EventEmitter();
mousemove = new EventEmitter();
mouseout = new EventEmitter();
@HostListener('mouseup', ['$event'])
onMouseup(event) {
this.mouseup.emit(event);
}
@HostListener('mousedown', ['$event'])
onMousedown(event) {
this.mousedown.emit(event);
return false; // Call preventDefault() on the event
}
@HostListener('mousemove', ['$event'])
onMousemove(event) {
this.mousemove.emit(event);
}
@HostListener('mouseout', ['$event'])
onMouseout(event) {
this.mouseout.emit(event);
return false; // Call preventDefault() on the event
}
constructor(public element: ElementRef) {
this.element.nativeElement.style.position = 'relative';
this.element.nativeElement.style.cursor = 'pointer';
map;
merge;
this.mousedrag = this.mousedown.map(event => {
return {
top: event.clientY - this.element.nativeElement.getBoundingClientRect().top
left: event.clientX - this.element.nativeElement.getBoundingClientRect().left,
};
})
.flatMap(
imageOffset => this.mousemove.merge(this.mouseout).map(pos => ({
top: pos.clientY - imageOffset.top,
left: pos.clientX - imageOffset.left
}))
.takeUntil(this.mouseup)
);
}
ngOnInit() {
this.mousedrag.subscribe({
next: pos => {
this.element.nativeElement.style.top = pos.top + 'px';
this.element.nativeElement.style.left = pos.left + 'px';
}
});
}
}
@Component({
selector: 'my-app',
template: `
<div draggable>
<h1>Hello, World!</h1>
</div>
`,
directives: [Draggable,],
})
export class AppComponent {
}
或:
joblib.dump(pipeline.steps[0][1].transformer_list[0][1], dump_path)
要加载转储的对象,您可以使用:
joblib.dump(pipeline.get_params()['features__tfidf'], dump_path)
很遗憾,您无法使用pipeline.steps[0][1].transformer_list[0][1] = joblib.load(dump_path)
set_params
的反转来按名称插入估算工具。如果PR#1769: enable setting pipeline components as parameters中的更改已合并,您将可以使用