近似字符串与Python中提供的参考列表匹配

时间:2018-08-03 07:45:58

标签: python text match

我想在一个长字符串中找到最频繁出现的近似匹配,条件是该单词也来自提供的列表。

示例:

# provided list 
>> jobskill = ["scrum", "customer experience improvement", "python"]

# long string 
>> jobtext = ["We are looking for Graduates in our Customer Experience department in Swindon, you will be responsible for improving customer experience and will also be working with the digital team. Send in your application by 31st December 2018", 
"If you are ScrumMaster at the top of your game with ability to communicate inspire and take people with you then there could not be a better time, we are the pioneer in digital relationship banking, and we are currently lacking talent in our Scrum team, if you are passionate about Scrum, apply to our Scrum team, knowledge with python is a plus!"]

# write a function that returns most frequent approximate match
>> mostfrequent(input = jobtext, lookup = jobskill)
# desired_output: {"customer experience improvement, "scrum"}

感谢任何形式的帮助,谢谢!

2 个答案:

答案 0 :(得分:0)

我不熟悉您提到的Fuzzywuzzy,但是您可以适当地即兴创作。

import re

result = {}
for text in jobtext:
    for s in jobskill:
        check = re.findall(s, text, re.IGNORECASE)
        if check:
            result[s] = len(check)

print (result)

答案 1 :(得分:0)

使用Fuzzywuzzy

    String recipient = "aaaa@abcd.com";
    String[] ccrecipient = {"bbb@abcd.com"};
    String subject = "New profile added to "+client;
    String content = "<br><p style='margin-left:20px;'>"+recname+"'s candidate <b>"+name+"</b> has joined <b><i>"+client +";
    String user = "ccc@abcd.com";
    String pass = "abcd_123";
    SendmInvmail.sendmInvmail(host, port, user, pass, recipient,ccrecipient, subject, content);