Question

我们有一个问题 - 如下所示的答案语料库

Q: Why did Lincoln issue the Emancipation Proclamation? 
A: The goal was to weaken the rebellion, which was led and controlled by slave owners.

Q: Who is most noted for his contributions to the theory of molarity and molecular weight?  
A: Amedeo Avogadro

Q: When did he drop John from his name? 
A: upon graduating from college

Q: What do beetles eat? 
A: Some are generalists, eating both plants and animals. Other beetles are highly specialised in their diet.

将问题视为查询和答案作为文档。
我们必须构建一个系统，对于给定的查询（在语义上类似于问题语料库中的一个问题）能够获得正确的文档（答案语料库中的答案）
任何人都可以建议任何算法或好的方法来继续构建它。

Answer 1

您的问题过于宽泛，您正在尝试完成的任务具有挑战性。不过，我建议你阅读IR-based Factoid Question Answering。本文档引用了许多最先进的技术。阅读本文档应该会引导您了解一些想法。

请注意，您需要针对基于IR的Factoid QA和基于知识的QA采用不同的方法。首先，确定您要构建的QA系统类型。

最后，我认为QA的简单文档匹配技术还不够。但你可以尝试使用Lucene @Debasis建议的简单方法，看看它是否表现良好。

Answer 2

在Lucene中考虑一个问题及其答案（假设只有一个）作为单个文档。 Lucene支持文档的视野;因此，在构建文档时，使问题成为可搜索字段。在给定查询问题的情况下检索排名靠前的问题后，请使用Document类的get方法返回答案。

代码框架（自己填写）：

//Index
IndexWriterConfig iwcfg = new IndexWriterConfig(new StandardAnalyzer());
IndexWriter writer = new IndexWriter(...);
....
Document doc = new Document();
doc.add(new Field("FIELD_QUESTION", questionBody, Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("FIELD_ANSWER", answerBody, Field.Store.YES, Field.Index.ANALYZED));
...
...
// Search
IndexReader reader = new IndexReader(..);
IndexSearcher searcher = new IndexSearcher(reader);
...
...
QueryParser parser = new QueryParser("FIELD_QUESTION", new StandardAnalyzer());
Query q = parser.parse(queryQuestion);
...
...
TopDocs topDocs = searcher.search(q, 10); // top-10 retrieved
// Accumulate the answers from the retrieved questions which
// are similar to the query (new) question.
StringBuffer buff = new StringBuffer();
for (ScoreDoc sd : topDocs.scoreDocs) {
    Document retrievedDoc = reader.document(sd.doc);
    buff.append(retrievedDoc.get("FIELD_ANSWER")).append("\n");
}
System.out.println("Generated answer: " + buff.toString());

语料库上的QA查询系统

2 个答案: