斯坦福核心nlp - 使用ner和RegexNER - 如果重叠,regexNER将胜过ner

时间:2018-04-11 09:09:32

标签: stanford-nlp named-entity-recognition ner

我使用斯坦福核心nlp进行命名实体识别。 我发现使用regexp和ner togheter时遇到了问题。

这是我的一句话:

mi chiamo Vincenzo Monaco **翻译"我的名字是Vincenzo Monaco"

Ner发现正确" vincenzo monaco" as" NAME" regexner发现" monaco"像一个城市

和输出,请求http://localhost:9009/?properties= {"注释器":" ner,regexner"," outputFormat":" JSON"}

var clubs = [ 
    {id: 1, name : 'chelsea'},
    {id: 2, name : 'city'},
    {id: 3, name : 'liverpool'},
    {id: 4, name : 'manchester united'},
    {id: 5, name : 'arsenal'}
];
var selectedId = 3;
if(clubs.some(function(obj) { return obj.id == selectedId; }))
    $("#text").text('The selected id exist');
else
    $("#text").text('The selected id does not exist');

}

如果我在http://localhost:9009/?properties= {"注释者":" ner"," outputFormat":" json" }(没有正则表达式)它正确回答:

{
"sentences": [
    {
        "index": 0,
        "tokens": [
            {
                "index": 1,
                "word": "mi",
                "originalText": "mi",
                "lemma": "mi",
                "characterOffsetBegin": 0,
                "characterOffsetEnd": 2,
                "pos": "O",
                "ner": "O"
            },
            {
                "index": 2,
                "word": "chiamo",
                "originalText": "chiamo",
                "lemma": "chiamo",
                "characterOffsetBegin": 3,
                "characterOffsetEnd": 9,
                "pos": "O",
                "ner": "O"
            },
            {
                "index": 3,
                "word": "vincenzo",
                "originalText": "vincenzo",
                "lemma": "vincenzo",
                "characterOffsetBegin": 10,
                "characterOffsetEnd": 18,
                "pos": "O",
                "ner": "NAME"
            },
            {
                "index": 4,
                "word": "monaco",
                "originalText": "monaco",
                "lemma": "monaco",
                "characterOffsetBegin": 19,
                "characterOffsetEnd": 25,
                "pos": "O",
                "ner": "CITY"
            }
        ]
    }
]

}

我在regexner之前阅读了关于put ner的内容,但是从未改变,因为你可以看到并且反过来顺序。

1 个答案:

答案 0 :(得分:0)

monaco\tperson\tcity

使你的regexner线类似于上面的线
并在您的属性文件中写下以下
ner.fine.regexner.mapping = /自定义正则表达式文件的路径