Question

我在java中使用apache open nlp toolkit。我希望在给定的文本中只显示名称enitites，如geo-graphical，person等。以下代码片段给出字符串跨度

try {
        System.out.println("Input : Pierre Vinken is 61 years old");
        InputStream modelIn = new FileInputStream("en-ner-person.bin");
        TokenNameFinderModel model = new TokenNameFinderModel(modelIn);
        NameFinderME nameFinder = new NameFinderME(model);
        String[] sentence = new String[]{
                "Pierre",
                "Vinken",
                "is",
                "61",
                "years",
                "old",
                "."
                };

            Span nameSpans[] = nameFinder.find(sentence);
            for(Span s: nameSpans)
                System.out.println("Name Entity : "+s.toString());
    }
    catch (IOException e) {
      e.printStackTrace();
    }

输出：

输入：Pierre Vinken今年61岁姓名实体：[0..2）人

我如何获得等效的字符串而不是span，是否有任何API？

Answer 1

Span has the method getCoveredText(CharSequence text)会这样做。但我不明白为什么你需要一个API方法来获取对应于跨度的文本。跨度明确提供起始（包含）和结束（独占）整数偏移。所以以下就足够了：

StringBuilder builder = new StringBuilder();
for (int i = s.getStart(); i < s.getEnd(); i++) {
    builder.append(sentences[i]).append(" ");
}
String name = builder.toString();

Answer 2

您可以使用Span类本身。

以下类方法返回与另一个CharSequence Span CharSequence实例对应的text：

/**
 * Retrieves the string covered by the current span of the specified text.
 *
 * @param text
 *
 * @return the substring covered by the current span
 */
public CharSequence getCoveredText(CharSequence text) { ... }

请注意，此类还有两个静态方法，它们接受Span数组和CharSequence数组或标记数组（String[]）以返回等效数组{{ 1}}。

String

我希望它有所帮助...

给定像[0..2]的字符串范围如何找到字符串等价？

2 个答案: