空间文字分类分数

时间:2019-02-20 15:40:42

标签: nlp spacy text-classification

我对NLP文本分类非常陌生,并试图了解其基础知识。看来Spacy更适合我的工作和经验。我已通读所有文档,并使用自己的输出文件夹使用默认的plac参数从https://spacy.io/usage/training#example-textcat运行示例代码。然后,我编写了一个测试文件:

import spacy

output_dir="train_output_orig"

test_text = [
    "This movie sucked",
    "It's a great one",
    "I've watched a lot of films of this kind. A lot of them were more attractive for me",
    "This is a great movie",
    "This movie is terrible",
    "I love this movie",
    "This is a bad film",
    "So fucking dung!",
    "Very involving work with developed characters"
    ]
print("Loading from", output_dir)
nlp2 = spacy.load(output_dir)
for text in test_text:
    print(text, nlp2(text).cats)

并获得结果:

Loading from train_output_orig
This movie sucked {'POSITIVE': 0.6549780368804932}
It's a great one {'POSITIVE': 0.7863456606864929}
I've watched a lot of films of this kind. A lot of them were more attractive for me {'POSITIVE': 0.7664909958839417}
This is a great movie {'POSITIVE': 0.7897435426712036}
This movie is terrible {'POSITIVE': 0.4777064323425293}
I love this movie {'POSITIVE': 0.7530838847160339}
This is a bad film {'POSITIVE': 0.46895521879196167}
So fucking dung! {'POSITIVE': 0.6296740174293518}
Very involving work with developed characters {'POSITIVE': 0.8538092970848083}

对于Spacy模型是否可以,还是我做错了什么?我的意思是“正”和“负”标签之间的界限非常狭窄。甚至权威的《这是一部糟糕的电影》也获得了0.46的“积极”评级。 “我喜欢这部电影”仅获得0.75,而“非常喜欢与发达角色合作”获得了0.83。同时,在原始Spacy用法文档短语中建议“这部电影很烂”获得0.65的“积极”得分!

预先感谢您的回答

2 个答案:

答案 0 :(得分:0)

文本分类将返回模型中所有标签的分数。越接近0,模型就越不确定。距离1越近,该模型就越确信它是IT。

如果您的负面情绪与您的文字没有太大区别,那么我想您需要更多的培训数据。

答案 1 :(得分:0)

培训数据中的评论通常比上述示例更长。如果您尝试使用数据集中test中的一些示例,您会看到得分更像{'POSITIVE': 0.9939502477645874, 'NEGATIVE': 0.006049795541912317}

此外,使用该示例脚本训练的模型应该在POSITIVE中同时具有NEGATIVEcats标签,因此,如果只得到POSITIVE,则可能会有出错了? (不过,这些简短示例的cats基本上看起来像我期望的那样。)