我正在尝试解决一个nlp问题,其中我有一个单词的词典,如:
list_1={'phone':'android','chair':'netflit','charger':'macbook','laptop','sony'}
现在,如果输入是'手机'我可以很容易地使用'运营商通过密钥获取电话及其数据的描述,但问题是输入是否类似于电话'或者'手机'
我想如果我输入电话'然后我得到像
这样的词'phone' ==> 'Phones','phones','Phone','Phone's','phone's'
我不知道哪个word2c可以使用,哪个nlp模块可以提供这样的解决方案。
第二个问题是如果我说一句“狗”的话。我可以得到像“小狗”,“小猫”,“狗狗”,“小狗”这样的词汇。等?
我尝试了类似这样的东西,但给出了同义词:
from nltk.corpus import wordnet as wn
for ss in wn.synsets('phone'): # Each synset represents a diff concept.
print(ss)
但它的回归:
Synset('telephone.n.01')
Synset('phone.n.02')
Synset('earphone.n.01')
Synset('call.v.03')
相反,我想:
'phone' ==> 'Phones','phones','Phone','Phone's','phone's'
答案 0 :(得分:4)
WordNet索引概念(又名Synsets
)而不是单词。
使用lemma_names()
访问WordNet中的根词(又名Lemma
)。
>>> from nltk.corpus import wordnet as wn
>>> for ss in wn.synsets('phone'): # Each synset represents a diff concept.
... print(ss.lemma_names())
...
['telephone', 'phone', 'telephone_set']
['phone', 'speech_sound', 'sound']
['earphone', 'earpiece', 'headphone', 'phone']
['call', 'telephone', 'call_up', 'phone', 'ring']
作为根形式或单词的引理不应该有其他词缀,因此您不会找到您在所需单词列表中列出的复数或不同形式的单词。< / strong>
另见:
此外,单词含糊不清,可能需要通过上下文或我的词性(POS)消除歧义才能获得类似的&#34;单词,例如,你看到&#34;电话&#34;在动词中的意思与电话的含义不完全相同,而在&#34;名词&#34;中。
>>> for ss in wn.synsets('phone'): # Each synset represents a diff concept.
... print(ss.lemma_names(), '\t', ss.definition())
...
['telephone', 'phone', 'telephone_set'] electronic equipment that converts sound into electrical signals that can be transmitted over distances and then converts received signals back into sounds
['phone', 'speech_sound', 'sound'] (phonetics) an individual sound unit of speech without concern as to whether or not it is a phoneme of some language
['earphone', 'earpiece', 'headphone', 'phone'] electro-acoustic transducer for converting electric signals into sounds; it is held over or inserted into the ear
['call', 'telephone', 'call_up', 'phone', 'ring'] get or try to get into communication (with someone) by telephone