TL; DR

Question

我有一套感应键，例如＆＃34; long％3：00：02 ::＆＃34;来自SemCor + OMSTI。我怎样才能获得光彩？有地图文件吗？或者使用Nltk WordNet？

Answer 1

TL; DR

import re
from nltk.corpus import wordnet as wn

sense_key_regex = r"(.*)\%(.*):(.*):(.*):(.*):(.*)"
synset_types = {1:'n', 2:'v', 3:'a', 4:'r', 5:'s'}

def synset_from_sense_key(sense_key):
    lemma, ss_type, lex_num, lex_id, head_word, head_id = re.match(sense_key_regex, sense_key).groups()
    ss_idx = '.'.join([lemma, synset_types[int(ss_type)], lex_id])
    return wn.synset(ss_idx)

x = "long%3:00:02::"

synset_from_sense_key(x)

在长

NLTK中有这个非常钝的功能。但是，这不是从感知键读取而是从data_file_map读取（例如“data.adj”，“data.noun”等）：https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1355

由于我们已经在NTLK中拥有一个只有凡人可理解的API，并且有https://wordnet.princeton.edu/wordnet/man/senseidx.5WN.html的一些指南，

A sense_key is represented as:

     lemma % lex_sense

where lex_sense is encoded as:

    ss_type:lex_filenum:lex_id:head_word:head_id


(yada, yada...)

The synset type is encoded as follows:
1    NOUN 
2    VERB 
3    ADJECTIVE 
4    ADVERB 
5    ADJECTIVE SATELLITE

我们可以使用正则表达式https://regex101.com/r/9KlVK7/1/执行此操作：

>>> import re
>>> sense_key_regex = r"(.*)\%(.*):(.*):(.*):(.*):(.*)"

>>> x = "long%3:00:02::" 

>>> re.match(sense_key_regex, x)
<_sre.SRE_Match object at 0x10061ad78>

>>> re.match(sense_key_regex, x).groups()
('long', '3', '00', '02', '', '')

>>> lemma, ss_type, lex_num, lex_id, head_word, head_id = re.match(sense_key_regex, x).groups()


>>> synset_types = {1:'n', 2:'v', 3:'a', 4:'r', 5:'s'}


>>> '.'.join([lemma, synset_types[int(ss_type)], lex_id])
'long.a.02'

瞧，你从感知键获得了NLTK Synset()对象=）

>>> from nltk.corpus import wordnet as wn
>>> wn.synset(idx)
Synset('long.a.02')

Answer 2

我通过下载解决了这个问题。 http://wordnet.princeton.edu/glosstag.shtml 使用WordNet-3.0 \ glosstag \ merged中的文件创建自己的地图dic。

如何使用Nltk WordNet获得感知键的光泽度？

2 个答案:

TL; DR

在长