Question

我的问题是我想用文本数据解析一个文档（不是多个文档），并根据我的查询提取一些相关信息。

例如：如果我有以下文字：

This is a sample document.
Name: Te
Age: 25
Email: te@gmail.com
Some text in the end of the document

我想用相应的值

提取字段（名称，年龄，电子邮件）

我发现的许多示例主要是搜索与查询匹配的文档。如果有人可以指导我在lucene库或任何材料中查看哪个Analyzer或Query类，我将不胜感激。

Answer 1

这应该让你开始。使用正则表达式，在Java中，文档内容已分配给变量text：

String expr = "Name\:\s(\w+)\sAge\:\s+(\d+)\s+Email\:\s+([a-z0-9.@]+)\s+";
Pattern r = Pattern.compile(expr, Pattern.CASE_INSENSITIVE);
Matcher m = r.matcher(text);
if (m.find( ))
{
    System.out.println("Name: " + m.group(1) );
    System.out.println("Age: " + m.group(2) );
    System.out.println("Email: " + m.group(3) );
}
else { System.out.println("Match not found"); }

使用Lucene提取字段值

1 个答案: