使用Ruta,获取带注释的关键字的下一行中的数据

时间:2019-11-06 09:13:46

标签: uima ruta

如何获取其他行中位于注解关键字以下/上方的数据?我可以注释关键字,但无法获取信息

示例文字:

Underwriter's Name    Appraiser's Name          Appraisal Company Name
Alice Wheaton Cooper  Bruce Banner               Stark Industries

代码

TYPESYSTEM utils.PlainTextTypeSystem;
ENGINE utils.PlainTextAnnotator;

EXEC(PlainTextAnnotator, {Line});
ADDRETAINTYPE(WS);
Line{->TRIM(WS)};
REMOVERETAINTYPE(WS);
Document{->FILTERTYPE(SPECIAL)};

DECLARE UnderWriterKeyword, NameKeyword, UnderWriterNameKeyword;
DECLARE UnderWriterName(String label, String value);

CW{REGEXP("\\bUnderwriter") -> UnderWriterKeyword};
CW{REGEXP("Name")->NameKeyword};
(UnderWriterKeyword SW NameKeyword){->UnderWriterNameKeyword};
ADDRETAINTYPE(SPACE);
Line{CONTAINS(UnderWriterNameKeyword)} Line -> {
    (CW SPACE)+ {-> MARK(UnderWriterName)};
    };
REMOVERETAINTYPE(SPACE)

预期输出:

Underwriter's Name: Alice Wheaton Cooper    
Appraiser's Name: Bruce Banner
Appraisal Company Name: Stark Industries

请建议是否可以在RUTA中使用?如果为true,如何获取数据?

2 个答案:

答案 0 :(得分:0)

在Ruta中查看inlined rules

您可以为一行定义条件,并在下一行内注释所需的信息:

Line{CONTAINS(UserName)} Line -> { //your logic here };

答案 1 :(得分:0)

TYPESYSTEM utils.PlainTextTypeSystem;
ENGINE utils.PlainTextAnnotator;

DECLARE Header;
DECLARE ColumnDelimiter;
DECLARE Cell(INT column);

DECLARE Keyword (STRING label);
DECLARE Keyword UnderWriterNameKeyword, AppraiserNameLicenseKeyword,
AppraisalCompanyNameKeyword;

"Underwriter's Name" -> UnderWriterNameKeyword ( "label" = "UnderWriter
Name");
"Appraiser's Name/License" -> AppraiserNameLicenseKeyword ( "label" =
"Appraiser Name");
"Appraisal Company Name" -> AppraisalCompanyNameKeyword ( "label" =
"Appraisal Company Name");

DECLARE Entry(Keyword keyword);

EXEC(PlainTextAnnotator, {Line,Paragraph});

ADDRETAINTYPE(WS);
Line{->TRIM(WS)};
Paragraph{->TRIM(WS)};

SPACE[3,100]{-PARTOF(ColumnDelimiter) -> ColumnDelimiter};
Line -> {ANY+{-PARTOF(Cell),-PARTOF(ColumnDelimiter) -> Cell};};
REMOVERETAINTYPE(WS);

INT index = 0;
BLOCK(structure) Line{}{
    ASSIGN(index, 0);
    Line{STARTSWITH(Paragraph) -> Header};
    c:Cell{-> c.column = index, index = index + 1};
}

Header<-{hc:Cell{hc.column == c.column}<-{k:Keyword;};}
    # c:@Cell{-PARTOF(Header) -> e:Entry, e.keyword = k};

DECLARE Entity (STRING label, STRING value);
DECLARE Entity UnderWriterName, AppraiserNameLicense, AppraisalCompanyName;

FOREACH(entry) Entry{}{
    entry{ -> CREATE(UnderWriterName, "label" = k.label, "value" =
entry.ct)}<-{k:entry.keyword{PARTOF(UnderWriterNameKeyword)};};
    entry{ -> CREATE(AppraiserNameLicense, "label" = k.label, "value" =
entry.ct)}<-{k:entry.keyword{PARTOF(AppraiserNameLicenseKeyword)};};
    entry{ -> CREATE(AppraisalCompanyName, "label" = k.label, "value" =
entry.ct)}<-{k:entry.keyword{PARTOF(AppraisalCompanyNameKeyword)};};
}

最重要的规则如下:

Header<-{hc:Cell{hc.column == c.column}<-{k:Keyword;};}
    # c:@Cell{-PARTOF(Header) -> e:Entry, e.keyword = k};

它包含三个规则元素Header#Cell,并以此方式工作:

  • 该规则开始与Cell规则元素匹配,因为它已被@标记为锚点。
  • 此规则元素在所有Cell批注中或不属于其中的所有Header批注中匹配。它从满足此条件的第一个Cell注释开始,并将其称为“ c”。
  • 下一个规则元素是#,它将匹配直到下一个规则元素能够匹配。
  • 如果内联规则能够在Header注释的范围内匹配,则下一个规则元素在Header注释上匹配。内联规则在此范围内被记住为“ hc”的Cell注释上与特征column具有相同值的匹配。如果匹配包含记为“ k”的Keyword,则匹配成功。
  • 如果这三个规则元素成功匹配,则将应用操作。
  • 第一个动作是在Entry注释的跨度上创建一个称为“ e”的Cell注释。
  • 第二个操作将关键字分配给Entry功能keyword

作为摘要,该规则为不属于标题的每个Entry注释创建一个Cell注释,并分配相应列的header关键字以定义条目的类型