ImageNet索引到Wordnet 3.0同义词

时间:2017-08-22 20:27:47

标签: wordnet imagenet

在Caffe中使用ImageNet Resnet-50,预测给出了1000维向量。有没有一种简单的方法可以将此向量的索引转换为Wordnet 3.0 synset标识符?例如,415:'面包店,面包店,面包店'是" n02776631"?

我注意到一个类似的问题,Get ImageNet label for a specific index in the 1000-dimensional output tensor in torch,被问及与索引相关的人类可读标签,并且答案指向此URL中可用的索引到标签映射:{{3} }

从人类可读的标签我想可以通过此页面上的标签到synset映射找到Wordnet同义词集标识符:https://gist.github.com/maraoz/388eddec39d60c6d52d4但是我想知道这是否已经完成了?

1 个答案:

答案 0 :(得分:0)

来自https://gist.github.com/maraoz/388eddec39d60c6d52d4http://image-net.org/challenges/LSVRC/2015/browse-synsets的数据的映射似乎很简单:

{0: {'id': '01440764-n',
     'label': 'tench, Tinca tinca',
     'uri': 'http://wordnet-rdf.princeton.edu/wn30/01440764-n'},
 1: {'id': '01443537-n',
     'label': 'goldfish, Carassius auratus',
     'uri': 'http://wordnet-rdf.princeton.edu/wn30/01443537-n'},
 2: {'id': '01484850-n',
     'label': 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',
    'uri': 'http://wordnet-rdf.princeton.edu/wn30/01484850-n'},
 ...

请参阅https://gist.github.com/fnielsen/4a5c94eaa6dcdf29b7a62d886f540372获取完整档案。

我没有彻底检查这个映射是否真的正确。

此映射构建于:

import ast
from lxml import html
import requests
from pprint import pprint

url_index = ('https://gist.githubusercontent.com/maraoz/'
             '388eddec39d60c6d52d4/raw/'
             '791d5b370e4e31a4e9058d49005be4888ca98472/gistfile1.txt')
url_synsets = "http://image-net.org/challenges/LSVRC/2014/browse-synsets"

index_to_label = ast.literal_eval(requests.get(url_index).content)
elements = html.fromstring(requests.get(url_synsets).content).xpath('//a')

label_to_synset = {}
for element in elements:
    href = element.attrib['href']
    if href.startswith('http://imagenet.stanford.edu/synset?wnid='):
        label_to_synset[element.text] = href[42:]

index_to_synset = {
    k: {
        'id': label_to_synset[v] + '-n',
        'label': v,
        'uri': "http://wordnet-rdf.princeton.edu/wn30/{}-n".format(
            label_to_synset[v])
    }
    for k, v in index_to_label.items()}


pprint(index_to_synset)