sklearn管道无法正常工作

时间:2017-07-26 11:15:04

标签: python scikit-learn pipeline sentiment-analysis

我是sklearn管道并从sklearn文档中学习它的新手。我在movie review数据的情绪分析中使用它。数据包含两列,第一列class和第二列text

input_file_df = pd.read_csv("movie-pang.csv")
x_train = input_file_df["text"] #used complete data as train data
y_train = input_file_df["class"]

我只使用了一个功能,sentiment score for each sentence.我为此编写了自定义变换器:

class GetWorldLevelSentiment(BaseEstimator, TransformerMixin):

def __init__(self):
    pass

def get_word_level_sentiment(self, word_list):
    sentiment_score = 1
    for word in word_list:
        word_sentiment = swn.senti_synsets(word)

        if len(word_sentiment) > 0:
            word_sentiment = word_sentiment[0]
        else:
            continue

        if word_sentiment.pos_score() > word_sentiment.neg_score():
            word_sentiment_score = word_sentiment.pos_score()
        elif word_sentiment.pos_score() < word_sentiment.neg_score():
            word_sentiment_score = word_sentiment.neg_score()*(-1)
        else:
            word_sentiment_score = word_sentiment.pos_score()

        print word, " " , word_sentiment_score
        if word_sentiment_score != 0:
            sentiment_score = sentiment_score * word_sentiment_score

    return sentiment_score

def transform(self, review_list, y=None):
    sentiment_score_list = list()
    for review in review_list:
        sentiment_score_list.append(self.get_word_level_sentiment(review.split()))

    return np.asarray(sentiment_score_list)

def fit(self, x, y=None):
    return self

我使用的管道是:

pipeline = Pipeline([
("word_level_sentiment",GetWorldLevelSentiment()),
("clf", MultinomialNB())])

然后在管道上调用fit:

pipeline.fit(x_train, y_train)

但这给了我以下错误:

This MultinomialNB instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.

有人可以指导我这里做错了什么吗?这将是一个很大的帮助。

1 个答案:

答案 0 :(得分:0)

这对我有用:

class GetWorldLevelSentiment(BaseEstimator, TransformerMixin):

def __init__(self):
    pass

def get_word_level_sentiment(self, word_list):
    sentiment_score = 1
    for word in word_list:
        word_sentiment = swn.senti_synsets(word)

        if len(word_sentiment) > 0:
            word_sentiment = word_sentiment[0]
        else:
            continue

        if word_sentiment.pos_score() > word_sentiment.neg_score():
            word_sentiment_score = word_sentiment.pos_score()
        elif word_sentiment.pos_score() < word_sentiment.neg_score():
            word_sentiment_score = word_sentiment.neg_score()*(-1)
        else:
            word_sentiment_score = word_sentiment.pos_score()

        print word, " " , word_sentiment_score
        if word_sentiment_score != 0:
            sentiment_score = sentiment_score * word_sentiment_score

    return sentiment_score

def transform(self, review_list, y=None):
    sentiment_score_list = list()
    for review in review_list:
        sentiment_score_list.append(self.get_word_level_sentiment(review.split()))

    return pandas.DataFrame(sentiment_score-list)

def fit(self, x, y=None):
    return self