如何在sklearn的管道中腌制个别步骤?

时间:2016-03-28 10:00:35

标签: python machine-learning scikit-learn classification pipeline

我正在使用sklearn中的Pipeline对文字进行分类。

在此示例Pipeline中,我有一个TfidfVectorizer和一些自定义功能,其中包含FeatureUnion和一个分类器作为Pipeline步骤,然后我会根据培训数据和做预测:

from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC

X = ['I am a sentence', 'an example']
Y = [1, 2]
X_dev = ['another sentence']

# classifier
LinearSVC1 = LinearSVC(tol=1e-4,  C = 0.10000000000000001)

pipeline = Pipeline([
    ('features', FeatureUnion([
       ('tfidf', TfidfVectorizer(ngram_range=(1, 3), max_features= 4000)), 
       ('custom_features', CustomFeatures())])),
    ('clf', LinearSVC1),
    ])

pipeline.fit(X, Y)
y_pred = pipeline.predict(X_dev)

# etc.

在这里,我需要挑选TfidfVectorizer步骤并保持custom_features未打开,因为我仍然使用它们进行实验。我们的想法是通过挑选tfidf步骤来加快管道。

我知道我可以用Pipeline来腌制整个joblib.dump,但我如何挑选个别步骤呢?

1 个答案:

答案 0 :(得分:2)

要挑选TfidfVectorizer,您可以使用:

import {Component, Directive, HostListener, EventEmitter, ElementRef, OnInit} from 'angular2/core';
import {map, merge} from 'rxjs/Rx';

@Directive({
    selector: '[draggable]'
})
export class Draggable implements OnInit {

    mouseup = new EventEmitter();
    mousedown = new EventEmitter();
    mousemove = new EventEmitter();
    mouseout = new EventEmitter();

    @HostListener('mouseup', ['$event'])
    onMouseup(event) {
        this.mouseup.emit(event);
    }

    @HostListener('mousedown', ['$event'])
    onMousedown(event) {
        this.mousedown.emit(event);
        return false; // Call preventDefault() on the event
    }

    @HostListener('mousemove', ['$event'])
    onMousemove(event) {
        this.mousemove.emit(event);
    }

    @HostListener('mouseout', ['$event'])
    onMouseout(event) {
        this.mouseout.emit(event);
        return false; // Call preventDefault() on the event
    }

    constructor(public element: ElementRef) {
        this.element.nativeElement.style.position = 'relative';
        this.element.nativeElement.style.cursor = 'pointer';

        map;
        merge;
        this.mousedrag = this.mousedown.map(event => {
            return {
                top: event.clientY - this.element.nativeElement.getBoundingClientRect().top
                left: event.clientX - this.element.nativeElement.getBoundingClientRect().left,
            };
        })
        .flatMap(
            imageOffset => this.mousemove.merge(this.mouseout).map(pos => ({
                top: pos.clientY - imageOffset.top,
                left: pos.clientX - imageOffset.left
            }))
            .takeUntil(this.mouseup)
        );
    }

    ngOnInit() {
        this.mousedrag.subscribe({
            next: pos => {
                this.element.nativeElement.style.top = pos.top + 'px';
                this.element.nativeElement.style.left = pos.left + 'px';
            }
        });
    }
}

@Component({
    selector: 'my-app',
    template: `
        <div draggable>
            <h1>Hello, World!</h1>
        </div>
        `,
    directives: [Draggable,],
})
export class AppComponent {
}

或:

joblib.dump(pipeline.steps[0][1].transformer_list[0][1], dump_path)

要加载转储的对象,您可以使用:

joblib.dump(pipeline.get_params()['features__tfidf'], dump_path)

很遗憾,您无法使用pipeline.steps[0][1].transformer_list[0][1] = joblib.load(dump_path) set_params的反转来按名称插入估算工具。如果PR#1769: enable setting pipeline components as parameters中的更改已合并,您将可以使用