Question

我已经开始学习自然语言处理，并且已经开始磕磕绊绊。

我使用NodeJs在NaturalNode library的帮助下创建我的应用程序 Natural Node GitHub project

问题

我正在使用以下几种方案训练我的文档

/// importing package
var natural = require('natural');
var classifier = new natural.BayesClassifier();



/// traning document
classifier.addDocument("h", "greetings");
classifier.addDocument("hi", "greetings");
classifier.addDocument("hello", "greetings");
classifier.addDocument("data not working", "internet_problem");
classifier.addDocument("browser not working", "internet_problem");
classifier.addDocument("google not working", "internet_problem");
classifier.addDocument("facebook not working", "internet_problem");
classifier.addDocument("internet not working", "internet_problem");
classifier.addDocument("websites not opening", "internet_problem");
classifier.addDocument("apps not working", "internet_problem");
classifier.addDocument("call drops", "voice_problem");
classifier.addDocument("voice not clear", "voice_problem");
classifier.addDocument("call not connecting", "voice_problem");
classifier.addDocument("calls not going through", "voice_problem");
classifier.addDocument("disturbance", "voice_problem");
classifier.addDocument("bye", "close");
classifier.addDocument("thank you", "feedback_positive");
classifier.addDocument("thanks", "voice_problem");
classifier.addDocument("shit", "feedback_negeive");
classifier.addDocument("shit", "feedback_negeive");
classifier.addDocument("useless", "feedback_negetive");
classifier.addDocument("siebel testing", "siebel_testing")


classifier.train();


/// running classification
console.log('result for hi');
console.log(classifier.classify('hi'));
console.log('result for hii');
console.log(classifier.classify('hii'));
console.log('result for h');
console.log(classifier.classify('h'));

输出

result for hi:
greetings


result for hii:
internet_problem

result for h:
internet_problem

正如您在关键作品hi的结果中看到的那样，值正确无误，但如果我hi或hii拼错了ih，那么它就是错误的结果。我无法理解分类是如何工作的，我应该如何训练分类器，或者有办法找出分类结果是错误的，以便我可以请求用户再次输入。

任何帮助或解释或任何事情都非常感谢。非常感谢提前。

请将我视为一个菜鸟并原谅任何错误。

Answer 1

之前你的分类器从未见过

hii 和 ih ，所以除非natural.BayesClassifier对输入进行一些预处理，否则它不会知道如何处理它们，并使用从各个类标签的频率派生的prior probability对它们进行分类： internet_problem 是22个训练样例中最常见的标签。

编辑29/12/2016：正如评论中所讨论的那样，可以处理＆＃34;坏＆＃34;通过提示用户重新输入分类置信度小于给定最小阈值的数据进行分类：

const MIN_CONFIDENCE = 0.2; // Tune this

var classLabel = null;
do {
    var userInput = getUserInput(); // Get user input somehow
    var classifications = classifier.getClassifications(userInput);
    var bestClassification = classifications[0];
    if (bestClassification["value"] < MIN_CONFIDENCE) {
        // Re-prompt user in the next iteration
    } else {
        classLabel = bestClassification["label"];
    }   
} while (classLabel == null);
// Do something with the label

NLP：分类给出了错误的结果。如何找出NLP分类的结果是错误的？

1 个答案: