Question

这是我第一次发布Stack Overflow，所以请在将来提问时告诉我是否应该更加透彻。

目前我正在使用Java开发Android的Virtual Assistant应用程序，虽然目前进展顺利，但我不确定如何对用户输入进行分类。到目前为止，我已在程序中实现了Stanford NLP Parser，因此子句，短语和单词标签可以应用于原始文本。这使我能够让程序识别直接问题并从中提取主题，只需搜索某些标记的出现。

(ROOT
  (SBARQ <--- Indicates that the sentence is a question
    (WHNP (WP Who))
      (SQ (VBD were)
        (NP (DT the) (FW samurai))) <--- Subject of question
      (. ?)))

虽然这感觉向前迈出了一步，但我希望最终让助手能够对不同类型的问题进行分类（天气相关问题，时间/日期相关问题等），同时还能够识别不符合问题的问题。直接，但要求相同的信息（例如“你能告诉我关于武士吗？”而不是“谁是武士？”）。通过使用Stanford NLP Parser并寻找某些标签来实现这一点似乎是一项非常困难的任务。有没有人对我可以采取的替代方法有任何建议？

感谢 - 你！

Answer 1

关于虚拟助手或聊天机器人，这通常称为意图分类。有很多方法可以做到这一点，但通常你提供标记的例子并训练模型来区分它们。以下是a blog post关于该主题的一些示例数据：

# 3 classes of training data
training_data = []
training_data.append({"class":"greeting", "sentence":"how are you?"})
training_data.append({"class":"greeting", "sentence":"how is your day?"})
training_data.append({"class":"greeting", "sentence":"good day"})
training_data.append({"class":"greeting", "sentence":"how is it going today?"})

training_data.append({"class":"goodbye", "sentence":"have a nice day"})
training_data.append({"class":"goodbye", "sentence":"see you later"})
training_data.append({"class":"goodbye", "sentence":"have a nice day"})
training_data.append({"class":"goodbye", "sentence":"talk to you soon"})

training_data.append({"class":"sandwich", "sentence":"make me a sandwich"})
training_data.append({"class":"sandwich", "sentence":"can you make a sandwich?"})
training_data.append({"class":"sandwich", "sentence":"having a sandwich today?"})
training_data.append({"class":"sandwich", "sentence":"what's for lunch?"})

虽然您的培训数据特定于您的应用程序，但原则上它与自动分类电子邮件或新闻文章没有区别。

易于使用的文本分类基线算法是朴素贝叶斯。最近的方法包括使用 Word Mover的距离或神经网络。

您提取主题的部分也称为广告位检测，并且助理的“意图和广告位”架构很常见。即使您想从头开始构建某些东西，查看聊天机器人平台（如rasa）的配置屏幕可能有助于了解如何使用培训数据。

Java中虚拟助手的查询分类？

1 个答案: