Synset函数包含列表

时间:2016-04-13 22:08:00

标签: python loops wordnet

我需要遍历列表并将单词的同义词和下位词添加回列表。例如:

list_of_words = ["bird", "smart", "cool", "happy"]
list_of_words = list_of_words + list_of_words_synonyms + list_of_words_hypnonyms

我能够获得单个单词的同义词和催眠术,但需要迭代一系列值。

s = wordnet.synset(word)[0]

需要返回一个列表,其中包含添加到原始列表中的各个同义词。

预期结果是:     list_of_words = [“鸟”,“聪明”,“酷”,“快乐”,“母鸡”,“公鸡”..鸟的其他同义词,“聪明”,“智能”,智能的其他同义词....所以]

如何让synset函数迭代list_of_words并在列表中包含这些单词?我对文本分析很新。任何帮助表示赞赏。

3 个答案:

答案 0 :(得分:0)

编辑:根据OP的评论。输出格式已更改。

假设你想要这样的输出:

result = [
    ["bird", "smart", "cool", "happy"],
    [["Synonym 1 of bird...", ...], ["Synonym 1 of smart...", ...], ["Synonym 1 of cool...", ...], ["Synonym 1 of happy...", ...]],
    ...
]

新输出格式:

["bird", "smart", "cool", "happy", "synonym of bird", "hyponym of bird", "synonym of smart"... ]

您可以按如下方式遍历原始单词列表:

from pattern.en import wordnet

list_of_words = ["bird", "smart", "cool", "happy"]
original_length = len(list_of_words)

for word in list_of_words:
    s = wordnet.synsets(word)[0]

    # append synonyms list to the result
    list_of_words.append([s.synonyms])

    # append hyponyms list to the result
    list_of_words.append(s.hyponyms())

迭代后,您可以通过以下方式访问列表:

for index in range(original_length):
    print 'Displaying word %s' % list_of_words[index]
    print 'Synonyms: %s' % str(list_of_words[index + original_length])
    print 'Hyponyms: %s' % str(list_of_words[index + original_length + 1])

答案 1 :(得分:0)

这是一个快速实施。不要太担心fakesynsets,它只是wordnet.synsets的一个模型。您可以直接检查此功能后的代码。

def fakesynsets(word):
    from collections import namedtuple
    sysnset = namedtuple('sysnset', ['synonyms', 'hyponyms'])

    return [sysnset(synonyms = [word+'syn'+str(ii) for ii in range(1,3)], hyponyms = lambda : [word+'hyp'+str(ii) for ii in range(1,3)])]


list_of_words = ["bird", "smart", "cool", "happy"]
list_of_words_synonyms = []
list_of_words_hypnonyms = []

for word in list_of_words:
    s = fakesynsets(word)[0]
    list_of_words_synonyms.extend(s.synonyms)
    list_of_words_hypnonyms.extend(s.hyponyms())

list_of_words = list_of_words + list_of_words_synonyms + list_of_words_hypnonyms

print(list_of_words)

答案 2 :(得分:0)

(创建这个新答案,而不是更新我现有的答案,因为问题已经更新了很多)

最后通过安装包“pattern”并进行调试来了解wordnet.sysets()返回的内容。以下是运行的代码:

from pattern.en import wordnet

list_of_words = [u"bird", u"smart", u"cool", u"happy"]
list_of_words_synonyms = []
list_of_words_hypnonyms = []

for word in list_of_words:
    sts = wordnet.synsets(word)
    if len(sts):
        st = sts[0]
        list_of_words_synonyms.extend(st.synonyms)
        list_of_words_hypnonyms.extend(hs.senses[0] for hs in st.hyponyms())        

list_of_words = list_of_words + list_of_words_synonyms + list_of_words_hypnonyms
print(list_of_words)

请注意:

  1. 不考虑重复。如果要求删除重复,则可以使用sets.Set代替列表
  2. 对于每个催眠术,它有多种感官。 list_of_words_hypnonyms只包含第一个。如果要包含所有这些,请使用以下代码替换相应的行: list_of_words_hypnonyms.extend(sense for hs in st.hyponyms() for sense in hs.senses)
  3. 用于将同名词添加到list_of_words_hypnonyms,使用generator expression
  4. 结果是:

    [u'bird', u'smart', u'cool', u'happy', u'bird', u'smart', u'smarting', u'smartness', u'cool', u'dickeybird', u'cock', u'hen', u'nester', u'night bird', u'bird of passage', u'protoavis', u'archaeopteryx', u'Sinornis', u'Ibero-mesornis', u'archaeornis', u'ratite', u'carinate', u'passerine', u'nonpasserine bird', u'bird of prey', u'gallinaceous bird', u'parrot', u'cuculiform bird', u'coraciiform bird', u'apodiform bird', u'caprimulgiform bird', u'piciform bird', u'trogon', u'aquatic bird', u'twitterer']