需要匹配注释功能 - UIMA RUTA

时间:2017-03-02 07:49:58

标签: uima ruta

我需要匹配注释的特征,还需要标记匹配特征的第二个注释。我已经尝试过,但我面临两个问题

问题1:

SEPERATEDA annotation values got reduced.I think its due to dictRemoveWS.

问题2:

It showing only the last match.(Due to some looping problem).

示例文件1:

Arash Alipour
Rahul Bhargava
Lisette I.S. Wintgens
B. Rahul
Alipour A
Ali Aldabahi
M. Naziruddin Khan
Martin J. Swaans
Naziruddin Khan

文件1的预期输出:

Rahul
Alipour
Naziruddin
Khan

示例文件2:

M. Naziruddin Khan
Arash Alipour
Rahul Bhargava
Lisette I.S. Wintgens
Alipour A
Ali Aldabahi
M. Naziruddin Khan

文件2的预期输出:

Alipour
Naziruddin
Khan    

我的剧本:

PACKAGE uima.ruta.example;
DECLARE SINGLEINITIAL;

CW{REGEXP(".")->MARK(SINGLEINITIAL)};                   
DECLARE SeperateDA;
DECLARE DA;
"Arash Alipour"->DA;
"Lisette I.S. Wintgens"->DA;
"Alipour A"->DA;
"Rahul Bhargava"->DA;
"M. Naziruddin Khan"->DA;
"B. Rahul"->DA;
"Ali Aldabahi"->DA;
"A. S. Al Dwayyan"->DA;
"Lucas V.A. Boersma"->DA;
"Jippe C. Bal"->DA;
"Benno J.W.M. Rensing"->DA;
"Martin J. Swaans"->DA;

BLOCK(DocAuth) DA{}
{
CW{-PARTOF(SINGLEINITIAL)-> MARK(SeperateDA)};
}


DECLARE RepeatedDA(STRING auth);  
STRING MatchedAuth;
SeperateDA{->MARK(RepeatedDA),MATCHEDTEXT(MatchedAuth)}->{RepeatedDA{->RepeatedDA.auth=MatchedAuth};};
STRING auth;

FOREACH(RepAuth) RepeatedDA{}
 {  
    (da1:RepeatedDA {->UNMARK(RepeatedDA)}# da2:RepeatedDA){da1.auth != da2.auth};
 }  

我也试过这样的事情

   da:RepeatedDA{->da.auth =  RepeatedDA.auth}; 
   FOREACH(RepAuth, true) RepeatedDA{}
          {
              # da:RepeatedDA{->auth =  da.auth, LOG("      auth-" +auth)};
              da:RepeatedDA {auth != da.auth-> UNMARK(da)};
          }

我的目标是从DA中移除更多相似的名称。例如,从上面的示例文件中,Rahul Bhargava和B. Rahul都在DA中。但我只需要Rahul Bhargava在DA中。

1 个答案:

答案 0 :(得分:1)

您的规则逻辑似乎存在问题。

由于auth功能的值不同,

da1:RepeatedDA # da2:RepeatedDA da2始终匹配直接下一个RepeatedDA / SeperateDA。因此,该规则几乎每次都适用。

试试这个:

DECLARE SINGLEINITIAL;

CW{REGEXP(".")->MARK(SINGLEINITIAL)};                   
DECLARE SeperateDA (STRING auth);
DECLARE DA;
"Arash Alipour"->DA;
"Lisette I.S. Wintgens"->DA;
"Alipour A"->DA;
"Rahul Bhargava"->DA;
"M. Naziruddin Khan"->DA;
"B. Rahul"->DA;
"Ali Aldabahi"->DA;
"A. S. Al Dwayyan"->DA;
"Lucas V.A. Boersma"->DA;
"Jippe C. Bal"->DA;
"Benno J.W.M. Rensing"->DA;
"Martin J. Swaans"->DA;

BLOCK(DocAuth) DA{}
{
CW{-PARTOF(SINGLEINITIAL)-> CREATE(SeperateDA, "auth" = CW.ct)};
}

DECLARE RepeatedDA;
da1:SeperateDA{-> RepeatedDA} # da2:SeperateDA{da1.auth == da2.auth};

免责声明:我是UIMA Ruta的开发者