如何获得最可能的匹配线

时间:2019-10-22 12:13:19

标签: python shell

我的文本文件包含以下几行

1). please share the user manual and integration document
2). what is long code
3). what is short code
4). what are the long code and short code numbers
5). how to create group

如果我提供了一些输入字符串,例如“如何创建组”,则它必须必须提供文件中最匹配的行。

对于“我将如何创建组”行,文件中最匹配的行是“如何创建组”。

我知道以下

grep 'string pattern' file

但这可以处理单个单词

1 个答案:

答案 0 :(得分:2)

有些冗长的输出,但您可以根据自己的意愿进行编辑。此人使用difflib来计算相似度。

from difflib import SequenceMatcher

def get_match_ratio(sentence1, sentence2):
    return SequenceMatcher(None, sentence1, sentence2).ratio()

def match(iterable, sentence):
    """returns dictionary {iterable-element: percent match with sentence}"""
    return {element: get_match_ratio(element, sentence) for element in iterable}

def ranked_match(iterable, sentence):
    """returns list of iterable-elements sorted by percent match of sentence"""
    return [element[0] for element in sorted(
        match(iterable, sentence).items(), key=lambda x: x[1], reverse=True
        )]

# That comes from the text-file
sentences = [
    '1). please share the user manual and integration document',
    '2). what is long code',
    '3). what is short code',
    '4). what are the long code and short code numbers',
    '5). how to create group',
    ]

sample = "how I will create group"


if __name__ == '__main__':
    while True:
        sentence = input('Enter the sentence to approve:\n')
        results = match(sentences, sample)
        ranked = ranked_match(sentences, sample)
        print("Most matching sentence: " + ranked[0])
        # Most matching sentence: 5). how to create group

        print("Match quota: " + str(results[ranked[0]]) + "%")
        # Match quota: 0.7391304347826086%

        print("Ranked List: " + '; '.join(ranked))
        # Ranked List: 5). how to create group; 2). what is long code; 3). what is short code; 1). please share the user manual and integration document; 4). what are the long code and short code numbers

        print("Result Dictionary: ")
        # Result Dictionary: 

        print(results)
        # {'5). how to create group': 0.7391304347826086, '3). what is short code': 0.26666666666666666, '2). what is long code': 0.2727272727272727, '1). please share the user manual and integration document': 0.25, '4). what are the long code and short code numbers': 0.2222222222222222}