我想使用来自天才库的预训练天才英语模型对情绪进行分类。我大约有9万条推文,我想对所有内容进行分类。
问题是该风格库在大约7小时内完成了该操作。 要比较NLP情感分类器或TextBlob,可以在1分钟内完成此操作...
我针对该问题的代码是:
def flair_sentiment(data, classifier):
"""
data : text sequence (pandas.Series)
classifier : pretrained flair classifier
"""
values = []
for Item in data:
tokenized = Sentence(Item)
classifier.predict(tokenized)
values.append(tokenized.labels[0].score)
return values
df['sentiment'] = flair_sentiment(df.tweets, classifier)
答案 0 :(得分:0)
我认为您可以尝试以下步骤:
以下是用于使用批量预测分析推文中情绪的代码。它还显示了两种情感模型的运行时间。您可以看到基于RNN的模型比默认模型快得多。
from time import time
from flair.data import Sentence
from flair.models import TextClassifier
def flair_sentiment(texts, classifier):
sentences = [Sentence(text) for text in texts]
classifier.predict(sentences, mini_batch_size=32)
return [
(sent.labels[0].value, sent.labels[0].score)
for sent in sentences
]
for sentiment_model_name in ("sentiment", "sentiment-fast"):
classifier = TextClassifier.load(sentiment_model_name)
start_time = time()
tweets = 512 * [
"For what a beautiful day. #elated",
"It's broken"
]
sentiments = flair_sentiment(tweets, classifier)
# print(sentiments)
print(f"* Sentiment model {sentiment_model_name}: running time = {time() - start_time:.2f} second(s)")
输出:
2020-09-22 11:50:14,027 loading file /Users/khuc/.flair/models/sentiment-en-mix-distillbert_3.1.pt
* Sentiment model sentiment: running time = 19.99 second(s)
2020-09-22 11:50:36,369 loading file /Users/khuc/.flair/models/sentiment-en-mix-ft-rnn.pt
* Sentiment model sentiment-fast: running time = 0.43 second(s)