＆GT;标题

00345

XYZ

MethodName：fdsafk

日期：2012年4月23日

更多文字和部分包含XYZ`

的实例

所以我最初对XYZ进行了字典搜索并找到了位置，但我只想要第一个 XYZ ，而不是其余的。 XYZ有一个属性，它始终位于5位数代码和文本 MethondName 之间。

我无法做到这一点。

WORDLIST ZipList = 'Zipcode.txt';
DECLARE Zip;
Document
Document{-> MARKFAST(Zip, ZipList)};

DECLARE Method;
"MethodName" -> Method;


WORDLIST typelist = 'typelist.txt';
DECLARE type;
Document{-> MARKFAST(type, typelist)};

另外我们如何在UIMA RUTA中使用REGEX？

Answer 1

有很多方法可以指定它。以下是一些示例（未经测试）：

// just remove the other annotations (assuming type is the one you want)
type{-> UNMARK(type)} ANY{-STARTSWITH(Method)};

// only keep the first one: remove any annotation if there is one somewhere in front of it
// you can also specify this with POSISTION or CURRENTCOUNT, but both are slow
type # @type{-> UNMARK(type)}

// just create a new annotation in between
NUM{REGEXP(".....")} #{-> type} @Method;

在UIMA Ruta中使用正则表达式有两种选择：

（查找）简单的正则表达式规则，如"[A-Za-z]+" -> Type;
（匹配）REGEXP条件，用于验证规则元素的匹配，如
ANY{REGEXP("[A-Za-z]+")-> Type};

如果有什么不清楚，请告诉我。我将扩展描述。

免责声明：我是UIMA Ruta的开发者

使用UIMA Ruta在文本文件中搜索项目

＆GT;标题

00345

XYZ

MethodName：fdsafk

日期：2012年4月23日

更多文字和部分包含XYZ`

1 个答案: