情感分析-天赋预训练模型分类器。如何加快

时间:2019-09-07 07:28:20

标签: python machine-learning sentiment-analysis

我想使用来自天才库的预训练天才英语模型对情绪进行分类。我大约有9万条推文,我想对所有内容进行分类。

问题是该风格库在大约7小时内完成了该操作。 要比较NLP情感分类器或TextBlob,可以在1分钟内完成此操作...

我针对该问题的代码是:

def flair_sentiment(data, classifier):
"""
data : text sequence (pandas.Series)
classifier : pretrained flair classifier
"""
values = []
for Item in data:
    tokenized = Sentence(Item)
    classifier.predict(tokenized)
    values.append(tokenized.labels[0].score)
return values

df['sentiment'] = flair_sentiment(df.tweets, classifier)

1 个答案:

答案 0 :(得分:0)

我认为您可以尝试以下步骤:

  1. 在您的代码中,一次预测了一条推文的情绪。您可以使用批量预测来加快速度。
  2. 当前,对于Flair 0.6,有两种情感模型:“情感”(基于BERT的默认模型)和“快速情感”(基于RNN的精度稍差)。报告了他们的表演:https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_2_TAGGING.md#list-of-pre-trained-text-classification-models
  3. 当然,使用GPU可以大大加快速度。

以下是用于使用批量预测分析推文中情绪的代码。它还显示了两种情感模型的运行时间。您可以看到基于RNN的模型比默认模型快得多。

from time import time

from flair.data import Sentence
from flair.models import TextClassifier


def flair_sentiment(texts, classifier):
    sentences = [Sentence(text) for text in texts]
    classifier.predict(sentences, mini_batch_size=32)
    return [
        (sent.labels[0].value, sent.labels[0].score)
        for sent in sentences
    ]


for sentiment_model_name in ("sentiment", "sentiment-fast"):
    classifier = TextClassifier.load(sentiment_model_name)

    start_time = time()
    tweets = 512 * [
        "For what a beautiful day. #elated",
        "It's broken"
    ]
    sentiments = flair_sentiment(tweets, classifier)
    # print(sentiments)
    print(f"* Sentiment model {sentiment_model_name}: running time = {time() - start_time:.2f} second(s)")

输出:

2020-09-22 11:50:14,027 loading file /Users/khuc/.flair/models/sentiment-en-mix-distillbert_3.1.pt
* Sentiment model sentiment: running time = 19.99 second(s)
2020-09-22 11:50:36,369 loading file /Users/khuc/.flair/models/sentiment-en-mix-ft-rnn.pt
* Sentiment model sentiment-fast: running time = 0.43 second(s)