我有一本同义词词典:
synonym = {"this": ["this", "same"],
"all": ["all", "any", "*"],
"alluptolastyear": ["alluptolastyear", "uptolastyear"],
"dekadbefore": ["dekadbefore", "lastdekad", "formerdekad", "precedingdekad"],
"dekadafter": ["dekadafter", "nextdekad", "followingdekad"],
"yearbefore": ["yearbefore", "lastyear", "formeryear"],
"monthbefore": ["monthbefore", "lastmonth", "precedingmonth"]}
每个数组都存储通过键引用的同义词。 我从XML文件中读取了两个字符串,并尝试比较它们。
例如:
"this"
和"same"
相等(同义词)"all"
和"nextdekad"
不同是否可以帮助我使用同义词词典对这些字符串进行pythonic比较?
答案 0 :(得分:6)
试试这个:
def are_sinonims(a, b):
return a in synonym.get(b,[]) or b in synonym.get(a,[]) or any(a in synonym[k] and b in synonym[k] for k in synonym)
此外,我们可以将部分a in synonym[k] and b in synonym[k] for k in synonym
重写为a in words and b in words for words in synonym.values()
,以便:
def are_sinonims(a, b):
return a in synonym.get(b,[]) \
or b in synonym.get(a,[]) \
or any(a in words and b in words for words in synonym.values())
答案 1 :(得分:4)
您可以将每个单词转换为“同义词哈希”(如果两个单词是同义词则相同,否则不同):
def sym_hash(word):
for w, s in synonym.items():
if word == w or word in s:
return w
return word
然后使用他们的“哈希”比较单词:
def phrases_equal(p1, p2):
return all(sym_hash(a) == sym_hash(b) for a, b in zip(p1, p2))
p1 = "all your base this dekadbefore are formeryear".split()
p2 = "any your base same lastdekad are yearbefore".split()
print phrases_equal(p1, p2) # True
实际上,同义词数据库的正确数据结构似乎是一个集合列表,而不是一个字典:
synonym = [
{"this", "same"},
{"all", "any", "*"},
{"alluptolastyear", "uptolastyear"},
{"dekadbefore", "lastdekad", "formerdekad", "precedingdekad"},
{"dekadafter", "nextdekad", "followingdekad"},
{"yearbefore", "lastyear", "formeryear"},
{"monthbefore", "lastmonth", "precedingmonth"}
]
在这种情况下,我们可以更有效地编码sym_hash
def sym_hash(word):
return next((s for s in synonym if word in s), word)
答案 2 :(得分:3)
为什么不呢:
def are_sinonims(a, b):
return b in synonym.get(a, []) or a in synonym.get(b, [])
在有错误评论后编辑。
答案 3 :(得分:1)
首先,为每个同义词创建一个新的dict作为关键:
word_to_word = {}
for syns in synonym.values():
for word in syns:
word_to_word[word] = syns
功能比较字符串:
def are_sinomic(a, b):
words_a, words_b = a.split(), b.split()
if len(words_a) != len(words_b):
return False
for word_a, word_b in zip(words_a, words_b):
if word_a != word_b and word_b not in word_to_word.get(word_a, []):
return False
return True
答案 4 :(得分:0)
如果您只关心 某个同义词,那么您可以根据dict值的排列构建一组2元组...:
synonym = {"this": ["this", "same"],
"all": ["all", "any", "*"],
"alluptolastyear": ["alluptolastyear", "uptolastyear"],
"dekadbefore": ["dekadbefore", "lastdekad", "formerdekad", "precedingdekad"],
"dekadafter": ["dekadafter", "nextdekad", "followingdekad"],
"yearbefore": ["yearbefore", "lastyear", "formeryear"],
"monthbefore": ["monthbefore", "lastmonth", "precedingmonth"]}
from itertools import chain, permutations
synonym_set = set(chain.from_iterable(permutations(val, 2) for val in synonym.values()))
def are_synonyms(a, b):
return (a, b) in synonym_set