Question

我已经安装了最新版本的转换器，并且能够使用其简单的语法对英语短语进行情绪预测：

from transformers import pipeline
sentimentAnalysis = pipeline("sentiment-analysis")
print(sentimentAnalysis("Transformers piplines are easy to use"))
HBox(children=(FloatProgress(value=0.0, description='Downloading', max=442.0, style=ProgressStyle(description_…

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=629.0, style=ProgressStyle(description_…

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=267844284.0, style=ProgressStyle(descri…

[{'label': 'POSITIVE', 'score': 0.9305251240730286}]

print(sentimentAnalysis("Transformers piplines are extremely easy to use"))

[{'label': 'POSITIVE', 'score': 0.9820092916488647}]

但是，当我尝试使用非英语语言（这里是希腊语）时，却没有获得预期的结果。

以下短语的英语翻译为：'This food is disgusting'，我希望我的情绪得分非常低，这不是我所得到的：

print(sentimentAnalysis("Αυτό το φαγητό είναι αηδιαστικό"))
[{'label': 'POSITIVE', 'score': 0.7899578213691711}]

这是尝试使用最佳的多语言模型：

好一些，但仍然超出目标。

有什么我可以做的吗？

Answer 1

问题在于，pipelines默认情况下会加载英语模型。对于情感分析，该值为distilbert-base-uncased-finetuned-sst-2-english，请参见here。

幸运的是，您只需指定要加载的确切模型即可，如docs for pipeline中所述：

from transformers import pipeline
pipe = pipeline("sentiment-analysis", model="<your_model_here>", tokenizer="<your_tokenizer_here>")

请记住，这些模型必须与各自任务的架构兼容。我可以找到的唯一希腊模型是nlpaueb/bert-base-greek-uncased-v1，对我来说，这似乎是一个基础模型。在这种情况下，您首先需要调整自己的模型以进行情感分析，然后可以从该检查点加载。否则，您可能还会得到令人怀疑的结果。

使用具有非英语语言的拥抱面变压器

1 个答案: