子字符串与字符串的模糊/通配符匹配

时间:2018-09-13 21:38:25

标签: python search match fuzzy-search

我得到了以下街道的清单

Fakestr. 1
Fakestr. 2
Fakestr. 3
.....
Fakestr. 11
Fakestr. 12
Fakestr. 13

还有其他20k。 (希望将示例保持较小)。

现在,我得到了另一个文本文件,其中Line可以容纳或不能容纳Street和Housnumber组合。

例如:

── Fakestreet_2-bla aha blatesttest\n
─ Fakestr._2-blablatesttest\n
Fakestreet 5_2017
── Fakestreet_2-jo-what
500000222 Fakestreet 13 .sdfs
Fakestreet_7
dsd Fakestreet 13 hae
500000 Fakestreet 12-14 d
Fakestreet 1 hey what 249

所以我尝试了使用difflib(difflib.get_close_matches,SequenceMatcher),wuzzyfuzzy等的不同方法。 没有一个能如愿。

到目前为止,我的最佳存档结果是:

import re
matchobj = re.search('Fakestr(.*)12','─ Fakestr._2-blablatesttest\n') 
print(matchobj.group(0))
--> Result: Error. 
--> But thats ok.

matchobj = re.search('Fakestr(.*)2','── Fakestreet_2-bla aha blatesttest\n') 
print(matchobj.group(0))
--> Result:  Fakestreet 2 
--> Thats ok

matchobj = re.search('Fakestr(.*)5','Fakestreet 5_2017') 
print(matchobj.group(0))
--> Result: Fakestreet 5 
--> Thats ok

matchobj = re.search('Fakestr(.*)2','── Fakestreet_2-jo-what') 
print(matchobj.group(0))
--> Result: Fakestreet 2
--> Thats ok

matchobj = re.search('Fakestr(.*)7','── Fakestreet_7') 
print(matchobj.group(0))
--> Result: Fakestreet 7
--> Thats ok

matchobj = re.search('Fakestr(.*)5','500000 Fakestreet 1-5 .sdfs') 
print(matchobj.group(0))
--> Result: Fakestreet 1-5
-->That would be okay i can solve this cases later

matchobj = re.search('Fakestr(.*)5','dfsd Fakestreet 5,6 aaf') 
print(matchobj.group(0))
--> Result: Fakestreet 5 
-->Thats ok

matchobj = re.search('Fakestr(.*)6','500000222 Fakestreet 5,6 .sdfs') 
print(matchobj.group(0))
--> Result: Fakestreet 5,6 
-->Thats ok

matchobj = re.search('Fakestr(.*)14','Fakestreet 1  hey what 249') 
print(matchobj.group(0))
--> Result: Fakestreet 1 hey what 124
--> Thats wrong

matchobj = re.search('Fakestr(.*)1','500000222 Fakestreet 12-14 .sdfs') 
print(matchobj.group(0))
--> Result: Fakestreet 12-1 
--> Thats wrong

matchobj = re.search('Fakestr(.*)1','222 Fakestreet 13 .sdfs') 
print(matchobj.group(0))
--> Result: Fakestreet 1 
--> Thats wrong 

那么我该如何处理后面的三种情况?有(仅)一些限制:门牌号不能长于3个数字。门牌号通常在街道名之后。

1 个答案:

答案 0 :(得分:0)

如果您还允许数字以外的其他字符,您似乎想将conn = sqlite3.connect("volunteers.db") c = conn.cursor() c.execute("""CREATE TABLE volunteers ( name text, bags_correct integer, bags_incorrect integer, total_bags integer, total_bags_value integer, percentage real )""") ####This is when NEW volunteers add their first bag of coins when they counted it CORRECTLY without weight error. c.execute("INSERT INTO volunteers VALUES (?, ?, ?, ?, ?, ?)", (name, 1, 0, 1, 100, 100)) conn.commit() HOWEVER ####This is when NEW volunteers add their first bag of coins BUT it is INCORRECT c.execute("INSERT INTO volunteers VALUES (?, ?, ?, ?, ?, ?)", (name, 0, 1, 1, 100, 0)) conn.commit() ####This is when NON-NEW volunteers add their bag of coins if it's CORRECT, hence the UPDATE. c.execute("UPDATE volunteers SET bags_correct = bags_correct + 1, total_bags = total_bags + 1, total_bags_value = total_bags_value + (?), percentage = (bags_correct / total_bags)*100 WHERE name = (?)", (100, name)) conn.commit() ####This is when NON-NEW volunteers add their bag of coins but it's INCORRECT. c.execute("UPDATE volunteers SET bags_incorrect = bags_incorrect + 1, total_bags = total_bags + 1, total_bags_value = total_bags_value + (?), percentage = (bags_correct / total_bags)*100 WHERE name = (?)", (100, name)) conn.commit() 更改为.*.*\d+