Question

我正在尝试使用NLTK wordnet synsets实现 Sultan Monolingual Aligner 查找同义词。

我有两个清单：

word1 = ['move', 'buy','learn']
word2 = ['study', 'purchase']

根据对齐器规则，如果word1[i]的{{1}}的同义词与word1的{{1}}的同义词相似，那么word2[j]和{{1}将被对齐。

这是我的代码：

word2

是的，我可以找到每个单词的同义词。这是输出：

word1[i]

上面的6行是word2[j]和from nltk.corpus import wordnet as wn def getSynonyms(word): synonymList1 = [] wordnetSynset1 = wn.synsets(word) tempList1=[] for synset1 in wordnetSynset1: synLemmas = synset1.lemma_names() for i in xrange(len(synLemmas)): word = synLemmas[i].replace('_',' ') if word not in tempList1: tempList1.append(word) synonymList1.append(tempList1) return synonymList1 def cekSynonyms(word1, word2): newlist = [] for i in xrange(len(word1)): for j in xrange(len(word2)): getsyn1 = getSynonyms(word1[i]) getsyn2 = getSynonyms(word2[j]) ds1 = [x for y in getsyn1 for x in y] ds2 = [x for y in getsyn2 for x in y] print ds1,"---align to--->",ds2,"\n" for k in xrange(len(ds1)): for l in xrange(len(ds2)): if ds1[k] == ds2[l]: #newsim = [ds1[k], ds2[l]] newsim = [word1[i], word2[j]] newlist.append(newsim) return newlist word1 = ['move', 'buy','learn'] word2 = ['study', 'purchase'] print cekSynonyms(word1, word2)内的每个单词都通过其同义词进行比较。

底行是对齐的单词。

正如我们在同义词集中看到的那样，[u'move', u'relocation', u'motion', u'movement', u'motility', u'travel', u'go', u'locomote', u'displace', u'proceed', u'be active', u'act', u'affect', u'impress', u'strike', u'motivate', u'actuate', u'propel', u'prompt', u'incite', u'run', u'make a motion'] ---align to---> [u'survey', u'study', u'work', u'report', u'written report', u'discipline', u'subject', u'subject area', u'subject field', u'field', u'field of study', u'bailiwick', u'sketch', u'cogitation', u'analyze', u'analyse', u'examine', u'canvass', u'canvas', u'consider', u'learn', u'read', u'take', u'hit the books', u'meditate', u'contemplate'] [u'move', u'relocation', u'motion', u'movement', u'motility', u'travel', u'go', u'locomote', u'displace', u'proceed', u'be active', u'act', u'affect', u'impress', u'strike', u'motivate', u'actuate', u'propel', u'prompt', u'incite', u'run', u'make a motion'] ---align to---> [u'purchase', u'leverage', u'buy'] [u'bargain', u'buy', u'steal', u'purchase', u'bribe', u'corrupt', u"grease one's palms"] ---align to---> [u'survey', u'study', u'work', u'report', u'written report', u'discipline', u'subject', u'subject area', u'subject field', u'field', u'field of study', u'bailiwick', u'sketch', u'cogitation', u'analyze', u'analyse', u'examine', u'canvass', u'canvas', u'consider', u'learn', u'read', u'take', u'hit the books', u'meditate', u'contemplate'] [u'bargain', u'buy', u'steal', u'purchase', u'bribe', u'corrupt', u"grease one's palms"] ---align to---> [u'purchase', u'leverage', u'buy'] [u'learn', u'larn', u'acquire', u'hear', u'get word', u'get wind', u'pick up', u'find out', u'get a line', u'discover', u'see', u'memorize', u'memorise', u'con', u'study', u'read', u'take', u'teach', u'instruct', u'determine', u'check', u'ascertain', u'watch'] ---align to---> [u'survey', u'study', u'work', u'report', u'written report', u'discipline', u'subject', u'subject area', u'subject field', u'field', u'field of study', u'bailiwick', u'sketch', u'cogitation', u'analyze', u'analyse', u'examine', u'canvass', u'canvas', u'consider', u'learn', u'read', u'take', u'hit the books', u'meditate', u'contemplate'] [u'learn', u'larn', u'acquire', u'hear', u'get word', u'get wind', u'pick up', u'find out', u'get a line', u'discover', u'see', u'memorize', u'memorise', u'con', u'study', u'read', u'take', u'teach', u'instruct', u'determine', u'check', u'ascertain', u'watch'] ---align to---> [u'purchase', u'leverage', u'buy'] [['buy', 'purchase'], ['buy', 'purchase'], ['learn', 'study'], ['learn', 'study'], ['learn', 'study'], ['learn', 'study']]和word1是对齐的词。

但为什么输出打印不止一次？像这样＆gt;＆gt; word2

如何在没有重复的情况下打印一次？像这样＆gt;＆gt; ['buy','purchase']

Answer 1

您可以通过将其转换为集合来删除此类列表中的重复项，但因为列表不可清除，所以您必须在途中通过元组：

a = [['buy', 'purchase'], ['buy', 'purchase'], ['learn', 'study'], \\
     ['learn', 'study'], ['learn', 'study'], ['learn', 'study']]
a = [list(x) for x in set([tuple(x) for x in a])]
print(a)

给出：

[['buy', 'purchase'], ['learn', 'study']]

Answer 2

基于先生。 nbubis回答，这里我编写了一个元组函数：

def tupleSynonyms(word1, word2):
    a = cekSynonyms(word1, word2)
    anew = [list(x) for x in set([tuple(x) for x in a])]
    return anew

print tupleSynonyms(word1, word2)

为什么打印的单词列表重复？

2 个答案: