在给定单词映射的情况下,如何检查两组语句是否是可传递的和自反的

时间:2019-05-22 17:49:42

标签: python

给出2个句子,以及一个带有同义词的单词映射,我想编写一个程序来确定这些句子是否相似。如果2个句子的单词数相同,并且每个对应的单词都是同义词,则它们是相似的。确保处理同义词之间的对称关系和传递关系。例如。同义词图是:

[(“a”, “b”), (“a”, “c”), (“a”, “d”), (“b”, “e”), (“f”, “e”), (“g”, “h”)]

然后句子“a e g”“f c h”是同义词。示例:

Input S1: “a e g” S2: “f c h” 
Map: [(“a”, “b”), (“a”, “c”), (“a”, “d”), (“b”, “e”), (“f”, “e”), (“g”, “h”)]
Output: True 

说明:“a”“f”是同义词,因为“a”“b”是同义词,“f”“e”是同义词,{{1 }}和“b”是同义词。同样,“e”“c”是同义词,“e”“g”是同义词。

我已经尝试过这组代码:

“h”

我无法考虑比较两个字符串的逻辑

1 个答案:

答案 0 :(得分:0)

这是一种可能性。由于您拥有equivalence relation,因此可以想到由其形成的equivalence classes。基本上,您可以创建集合,以使彼此同义词的所有单词都在一起。或者,如果您更愿意从图的角度来考虑,则可以将元组视为无向图的边并找到connected components

您可以这样做:

def make_word_classes(synom):
    # List of classes
    word_classes = []
    for w1, w2 in synom:
        # Go through existing classes
        for wcls in word_classes:
            # If one of the words is already in the current class
            if w1 in wcls or w2 in wcls:
                # Add both words and continue to next pair of words
                wcls.add(w1)
                wcls.add(w2)
                break
        else:  # Note this else goes with the for loop, not the if block
            # If there was no matching class, add a new one
            word_classes.append({w1, w2})
    return word_classes

synom = [("a", "b"), ("a", "c"), ("a", "d"), ("b", "e"), ("f", "e"), ("g", "h")]
word_classes = make_word_classes(synom)
print(word_classes)
# [{'a', 'c', 'b', 'd', 'f', 'e'}, {'h', 'g'}]

这样,很容易看出两个句子是否相等。您只需要检查每对单词是否相等或属于同一对等类:

def sentences_are_equivalent(s1, s2, word_classes):
    # Split into words
    l1 = s1.split()
    l2 = s2.split()
    # If they have different sizes they are different
    if len(l1) != len(l2):
        return False
    # Go through each pair of corresponding words
    for w1, w2 in zip(l1, l2):
        # If it is the same word then it is okay
        if w1 == w2:
            continue
        # Go through list of word classes
        for wcls in word_classes:
            # If both words are in the same class it is okay
            if w1 in wcls and w2 in wcls:
                # Continue to next pair of words
                break
        else:  # Again, this else goes with the for loop
            # If no class contains the pair of words
            return False
    return True

s1 = "a e g"
s2 = "f c h"
print(sentences_are_equivalent(s1, s2, word_classes))
# True