Python:将变量传递到Word中NL4K中的Synsets方法

时间:2013-01-15 11:16:57

标签: python nltk wordnet

我需要处理一个需要NLTK的项目,所以我在两周前开始学习Python,但很难理解Python和NLTK。

从NLTK文档中,我可以理解以下代码,如果我在下面的代码中手动添加单词apple和pear,它们的效果很好。

from nltk.corpus import wordnet as wn

apple = wn.synset('apple.n.01')
pear = wn.synset('pear.n.01')

print apple.lch_similarity(pear)

Output: 2.53897387106

但是,我需要使用NLTK来处理项目列表。例如,我有一个下面的项目列表,我想比较list1中的项目和list2 - 例如:将list1中的word1与list 2中的每个单词进行比较,然后将list1中的word2与list2中的每个单词进行比较,直到所有单词为止列表1进行了比较。

list1 = ["apple", "honey", "drinks", "flowers", "paper"]
list2 = ["pear", "shell", "movie", "fire", "tree", "candle"]

wordFromList1 = list1[0]
wordFromList2 = list2[0]

wordFromList1 = wn.synset(wordFromList1)
wordFromList2 = wn.synset(wordFromList2)    

print wordFromList1.lch_similarity(wordFromList2)

上述代码当然会出错。任何人都可以告诉我如何将变量传递给synset方法[wn.synset(* pass_variable_in_here *)],以便我可以使用双循环来获取它们的lch_similarity值。谢谢。

1 个答案:

答案 0 :(得分:5)

wordnet.synset需要表单的3-part name字符串: word.pos.nn

您没有为pos.nnlist1中的每个单词指定list2部分 '.n.01'

假设所有单词都是名词似乎是合理的,所以我们可以试试 将字符串list1附加到list2for word1, word2 in IT.product(list1, list2): wordFromList1 = wordnet.synset(word1+'.n.01') wordFromList2 = wordnet.synset(word2+'.n.02') 中的每个字符串:

wordnet.synset('drinks.n.01')
但是,这不起作用。 WordNetError提出synsets

另一方面,same doc page显示你可以 使用wordnet.synsets('drinks')方法查找相似的单词:

例如,[Synset('drink.n.01'), Synset('drink.n.02'), Synset('beverage.n.01'), Synset('drink.n.04'), Synset('swallow.n.02'), Synset('drink.v.01'), Synset('drink.v.02'), Synset('toast.v.02'), Synset('drink_in.v.01'), Synset('drink.v.05')] 返回列表:

drinks

所以在这一点上,你需要考虑一下你希望程序做什么。如果您可以选择此列表中的第一项作为for word1, word2 in IT.product(list1, list2): wordFromList1 = wordnet.synsets(word1)[0] wordFromList2 = wordnet.synsets(word2)[0] 的代理, 然后你可以使用

import nltk.corpus as corpus
import itertools as IT

wordnet = corpus.wordnet
list1 = ["apple", "honey", "drinks", "flowers", "paper"]
list2 = ["pear", "shell", "movie", "fire", "tree", "candle"]

for word1, word2 in IT.product(list1, list2):
    # print(word1, word2)
    wordFromList1 = wordnet.synsets(word1)[0]
    wordFromList2 = wordnet.synsets(word2)[0]
    print('{w1}, {w2}: {s}'.format(
        w1 = wordFromList1.name,
        w2 = wordFromList2.name,
        s = wordFromList1.lch_similarity(wordFromList2)))

这会导致程序看起来像这样:

apple.n.01, pear.n.01: 2.53897387106
apple.n.01, shell.n.01: 1.07263680226
apple.n.01, movie.n.01: 1.15267950994
apple.n.01, fire.n.01: 1.07263680226
...

产生

{{1}}