由于某些原因,我需要迭代WordNet3.0中的所有名词同义词,并在我的程序中使它们成为树结构。
但是当我通过下面列出的代码尝试这个时
from nltk.corpus import wordnet as wn
stack = []
duplicate_check = []
def iterate_all():
while(stack):
current_node = stack.pop()
print current_node,"on top"
for hypo in current_node.hyponyms():
stack.append(hypo)
duplicate_check.append(hypo)
if __name__ == "__main__":
root = wn.synset("entity.n.01")
stack.append(root)
duplicate_check.append(root)
iterate_all()
correct_list = list(wn.all_synsets('n'))
# print list( set(correct_list) - set(duplicate_check) )
print len(correct_list)
print len(duplicate_check)
我有duplicate_check
的96,308条记录,correct_list
的记录为82,115条。后者correct_list
包含正确数量的同义词,但不包含duplicate_check
将两个列表都隐藏到set
并检查两个列表中的元素之后,我发现我会通过上面列出的代码丢失名词关系中“实例”的关系。所以有人能告诉我:
(1)在WordNet 3.0中,“hyponyms”关系是否等于“instance of”?
(2)我的代码中是否有任何错误导致我无法在duplicate_list
中添加“关系词实例”?
我非常感谢你的时间。
环境: Ubuntu 14.04 + Python 2.7 + NLTK最新版本+ WordNet 3.0
答案 0 :(得分:0)
首先,没有必要从entity.n.01
自上而下迭代得到它的下位词,你只需检查所有同义词中的root_hypernyms
botton-up:
>>> from nltk.corpus import wordnet as wn
>>> len(set(wn.all_synsets('n')))
82115
>>> entity = wn.synset('entity.n.01')
>>> len([i for i in wn.all_synsets('n') if entity in i.root_hypernyms()])
82115
以下是Synset.root_hypernyms()
的工作代码https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L439:
def root_hypernyms(self):
"""Get the topmost hypernyms of this synset in WordNet."""
result = []
seen = set()
todo = [self]
while todo:
next_synset = todo.pop()
if next_synset not in seen:
seen.add(next_synset)
next_hypernyms = next_synset.hypernyms() + \
next_synset.instance_hypernyms()
if not next_hypernyms:
result.append(next_synset)
else:
todo.extend(next_hypernyms)
return result
还有另一种方式可以访问超级/下级,但看起来它不像NLTK那样完美,请参阅How to get all the hyponyms of a word/synset in python nltk and wordnet?:
>>> len(set([s for s in entity.closure(lambda s:s.hyponyms())]))
74373
单独迭代:
>>> for s in entity.closure(lambda s:s.hyponyms()):
... print s
让我们尝试自下而上:
>>> from nltk.corpus import wordnet as wn
>>>
>>> synsets_with_entity_root = 0
>>> entity = wn.synset('entity.n.01')
>>>
>>> for i in wn.all_synsets('n'):
... # Get root hypernym the hard way.
... x = set([s for s in i.closure(lambda s:s.hypernyms())])
... if entity in x:
... synsets_with_entity_root +=1
...
>>> print synsets_with_entity_root
74373
似乎在自下而上解析超级下位树并以这种方式充值时,我们缺少~8000个同义词,所以我们检查:
entity = wn.synset('entity.n.01')
for i in wn.all_synsets('n'):
# Get root hypernym the hard way.
x = set([s for s in i.closure(lambda s:s.hypernyms())])
if entity in x:
synsets_with_entity_root +=1
else:
print i, i.root_hypernyms()
你会得到一个缺失的~8000个同义词列表,这里是你会看到的前几个:
Synset('entity.n.01') [Synset('entity.n.01')]
Synset('hegira.n.01') [Synset('entity.n.01')]
Synset('underground_railroad.n.01') [Synset('entity.n.01')]
Synset('babylonian_captivity.n.01') [Synset('entity.n.01')]
Synset('creation.n.05') [Synset('entity.n.01')]
Synset('berlin_airlift.n.01') [Synset('entity.n.01')]
Synset('secession.n.02') [Synset('entity.n.01')]
Synset('human_genome_project.n.01') [Synset('entity.n.01')]
Synset('manhattan_project.n.02') [Synset('entity.n.01')]
Synset('peasant's_revolt.n.01') [Synset('entity.n.01')]
Synset('first_crusade.n.01') [Synset('entity.n.01')]
Synset('second_crusade.n.01') [Synset('entity.n.01')]
Synset('third_crusade.n.01') [Synset('entity.n.01')]
Synset('fourth_crusade.n.01') [Synset('entity.n.01')]
Synset('fifth_crusade.n.01') [Synset('entity.n.01')]
Synset('sixth_crusade.n.01') [Synset('entity.n.01')]
Synset('seventh_crusade.n.01') [Synset('entity.n.01')]
所以closure()
方法可能有点有损,但如果不考虑确切的数字,它仍然是一种优雅的方法。
答案 1 :(得分:0)
此代码可防止错误发生:
<!doctype html>
<html>
<head>
<meta charset="UTF-8">
<link rel="stylesheet" href="style.css">
</head>
<body>
<p>Drag the ball.</p>
<img src="https://en.js.cx/clipart/soccer-gate.svg" id="gate" class="droppable">
<img src="https://en.js.cx/clipart/ball.svg" id="ball">
<script>
let currentDroppable = null;
ball.onmousedown = function(event) {
let shiftX = event.clientX - ball.getBoundingClientRect().left;
let shiftY = event.clientY - ball.getBoundingClientRect().top;
ball.style.position = 'absolute';
ball.style.zIndex = 1000;
document.body.append(ball);
moveAt(event.pageX, event.pageY);
function moveAt(pageX, pageY) {
ball.style.left = pageX - shiftX + 'px';
ball.style.top = pageY - shiftY + 'px';
}
function onMouseMove(event) {
moveAt(event.pageX, event.pageY);
ball.hidden = true;
let elemBelow = document.elementFromPoint(event.clientX, event.clientY);
ball.hidden = false;
if (!elemBelow) return;
let droppableBelow = elemBelow.closest('.droppable');
if (currentDroppable != droppableBelow) {
if (currentDroppable) { // null when we were not over a droppable before this event
leaveDroppable(currentDroppable);
}
currentDroppable = droppableBelow;
if (currentDroppable) { // null if we're not coming over a droppable now
// (maybe just left the droppable)
enterDroppable(currentDroppable);
}
}
}
document.addEventListener('mousemove', onMouseMove);
ball.onmouseup = function() {
document.removeEventListener('mousemove', onMouseMove);
ball.onmouseup = null;
};
};
function enterDroppable(elem) {
elem.style.background = 'pink';
}
function leaveDroppable(elem) {
elem.style.background = '';
}
ball.ondragstart = function() {
return false;
};
let x, y;
document.addEventListener('mousemove', e => {
x = e.clientX;
y = e.clientY;
});
document.addEventListener('keyup', e => {
if(e.code === 'Space') {
ball.style.position = 'absolute';
ball.style.top = `${y}px`;
ball.style.left = `${x}px`;
}
});
</script>
</body>
</html>
结果如下:
from nltk.corpus import wordnet
L=len(wordnet.synsets('rock', pos='n'))
for i in range(0,L):
syn = wordnet.synsets(word)[i].name()
print("---------",syn,"--------",wordnet.synsets(word)[i].definition())
while(syn!='entity.n.01'):
hyper=wordnet.synset(syn).hypernyms()
if len(hyper)>0:
name=hyper[0].name().split('.')[0]
print(name)
else:
print("**** INSTANCE_OF sense, without any hypernyms ****")
break
syn=hyper[0].name()