从句子

时间:2018-01-14 16:32:06

标签: python algorithm nlp pattern-recognition

我试图创建一个可以将句子转换为问题的算法。这是代码:

def sentence_to_question(arg):
    hverbs = ["is", "have", "had", "was", "could", "would", "will", "do", "did", "should", "shall", "can", "are"]
    words = arg.split(" ")
    zen_sim = (0, "", "")
    for hverb in hverbs:
        for word in words:
            similarity = SequenceMatcher(None, word, hverb).ratio()*100
            if similarity > zen_sim[0]:
                zen_sim = (similarity, hverb, word)
    if zen_sim[0] < 30:
        raise ValueError("unable to create question.")
    else:
        words.remove(zen_sim[2])
        words = " ".join(words)[0].lower() + " ".join(words)[1:]
        question = "{0} {1}?".format(zen_sim[1].capitalize(), words)
        return question

解释: 有一个准备好的帮助动词列表,句子的每个单词都与帮助动词进行比较。将选择具有最高相似性帮助动词的词。 difflib.SequenceMatcher&#39; s Ratcliff/Obershep pattern recognition algorithm用于比较两个字符串的相似性。虽然如果相似度百分比小于30%,我认为用户可以拼错帮助动词的可能性非常低,并且在问题中不存在帮助动词的概率很高,因此问题无法识别。最后,选择的帮助动词放在字符串的开头。 (我知道&#34;算法&#34;需要更多优化)。

测试示例

>>> sentence_to_question("Euclid was a prominent mathematician")
'Was euclid a prominent mathematician?'
>>> sentence_to_question("Euclid waz a prominent mathematician") # does work with small typos
'Was euclid a prominent mathematician?'
>>> sentence_to_question("The company name, logo and slogan are vital elements of the house style of a company and important elements of corporate design")
'Are the company name, logo and slogan vital elements of the house style of a company and important elements of corporate design?'

现在我们知道,可以从一个句子生成多个问题,但目前我的算法&#34;只能生成单个问题。但是,产生这些问题的最准确方法是什么? -

  

什么是公司家居风格和企业设计重要元素的重要元素?

     

是欧几里德?

请注意这两个问题之间的区别,第一个问题是什么作为其初始动词,但第二个问题

我没有要求代码(虽然我会欣赏例子),从句子中生成多个问题的最准确方法是什么?

谢谢!

0 个答案:

没有答案