缩写使用Uima Ruta

时间:2016-05-23 12:54:47

标签: uima ruta

我尝试使用Uima Ruta标记某些文件中的缩写。我使用了如下的简单脚本,但不适用于某些缩写。

我的算法是这样的; 1.将缩写分为字母/数字(ATM - > A,T,M.IC3 - > I,C,3) 2.将数字转换为字母(I,C,3 - > I,C,C,C) 3.阅读当前句子并将字母与单词匹配(可能/可能不包含停用词)

但我不知道如何在Ruta中实现同样的目标。我在哪里可以寻找这样的循环和控制结构?

示例输入:

  The National Academies of Science, Engineering, and Medicine (NAS)
  registered nurses (RNs)
  Licensed practical nurses (LPNs)
  Asian/Pacific Islander Americans (APIAs)

  Crime&Investigation Network (CI) 
  Internet Crime Complaint Center (“IC3”)
  Practice Management <PM>

脚本:

CW (EnglishStopWord?|SPECIAL?)? CW (EnglishStopWord?|SPECIAL?)? CW (EnglishStopWord?|SPECIAL?)? CW (EnglishStopWord?|SPECIAL?)? CW LParen CAP RParen{-> MARK(DZC_ABBREVIATIONS, 1, 12)};
CW (EnglishStopWord?|SPECIAL?)? CW (EnglishStopWord?|SPECIAL?)? CW (EnglishStopWord?|SPECIAL?)? CW (EnglishStopWord?|SPECIAL?)? CW{-PARTOF(DZC_ABBREVIATIONS)}  LParen CAP RParen{-PARTOF(DZC_ABBREVIATIONS) -> MARK(DZC_ABBREVIATIONS, 1, 12)};
CW (EnglishStopWord?|SPECIAL?)? CW (EnglishStopWord?|SPECIAL?)? CW (EnglishStopWord?|SPECIAL?)? CW (LParen CAP SW?  RParen){-PARTOF(DZC_ABBREVIATIONS) ->  MARK(DZC_ABBREVIATIONS, 1, 11)};

未标记的缩写:

Chronic Kidney Disease in Children (CKiD)
Society of Intercultural Education, Training, and Research (SIETAR)
The National Academies of Science, Engineering, and Medicine (NAS)
Internet Crime Complaint Center (“IC3”)

0 个答案:

没有答案