是否有一个开源Java库/算法用于查找特定文本是否是一个问题?
我正在研究一个问题回答系统,需要分析用户输入的文本是否是一个问题。
我认为问题可以通过使用开源NLP库来解决,但它显然比简单的词性标记更复杂。因此,如果有人可以通过使用现有的开源NLP库来告诉算法,那也会很好。
如果您知道使用数据挖掘来解决此问题的库/工具包,请告诉我。虽然很难获得足够的数据用于培训目的,但我将能够使用堆栈交换数据进行培训。
答案 0 :(得分:11)
在问题的句法分析中,正确的结构将采用以下形式:
(SBARQ (WH+ (W+) ...)
(SQ ...*
(V+) ...*)
(?))
因此,使用任何可用的语法分析器,具有嵌入式SQ(可选)的SBARQ节点的树将指示输入是一个问题。 WH +节点(WHNP / WHADVP / WHADJP)包含问题词干(who / what / when / where / why / how),SQ包含倒置短语。
即:
(SBARQ
(WHNP
(WP What))
(SQ
(VBZ is)
(NP
(DT the)
(NN question)))
(. ?))
当然,有很多先行条款会导致解析中的错误(可以解决),以及写得很糟糕的问题。例如,这篇文章的标题“如何判断一个句子是一个问题?”将有一个SBARQ,但不是SQ。
答案 1 :(得分:1)
许多准问题/信息请求都是以语句的语法形式提出的;例如“我想知道谁偷了我的自行车”。
我会放弃从其结构中确定的所有希望用户的输入是否是一个问题,并且只是假设它是一个问题,除非它明确地不是一个问题。您可以采用迭代的交互式方法,以便系统可以改进其对用户输入的“理解”:
User: I would like to know how many angels fit on the head of a pin.
System: Do you have a question about angels?
User: Yes.
System: Do you want to know if angels are fit?
User: No.
System: Do you want to know if angels have heads?
User: Possibly.
System: Do you want to know if angels have pins?
User: No.
System: Do you want to know if angels are numerous?
User: No.
System: Do you want to know the dimensions of an angel?
User: Yes.
System: Do you mean angels in paintings?
User: No.
System: Do you mean angels in myth and religious writing?
User: Yes.
System: Angels are metaphysical beings.
User: I hear that Pennsylvania was founded by William Penn. Is that true?
System: Do you have a question about hearing?
User: No.
System: Do you have a question about Pennsylvania?
User: Yes.
System: Pennsylvania was founded by William Penn.
User: When?
System: 1682.
User: What does the name mean?
System: What name?
User: Pennsylvania!
System: Do you want to know the meaning of Pennsylvania?
User: Yes.
System: Pennsylvania means Penn's Woods.
答案 2 :(得分:1)
看一下Link Grammar Parser这是一个多语言解析器,它基于句子中相关单词的链接对的概念。它是用C语言编写的,但也有一个Java JNI接口。