我有一份文件,其中包含评估,HPI,ROS,Vitals等部分。 我想在每个部分提取笔记。我正在为此目的使用GATE。我制作了一个JAPE文件,它将在评估部分提取笔记。以下是语法,
Input: Token
Options: control=appelt debug=true
Rule: Assess
({Token.string =~"(?i)diagnose[d]?"}{Token.string=="with"} | {Token.string=~"(?i)suffering"}{Token.string=~"(?i)from"} | {Token.string=~"(?i)suffering"}{Token.string=~"(?i)with"})
(
({Token})*
):assessments
({Token.string =~"(?i)HPI"} | {Token.string =~"(?i)ROS"} | {Token.string =~"(?i)EXAM"} | {Token.string =~"(?i)VITAL[S]"} | {Token.string =~"(?i)TREATMENT[s]"} |{Token.string=~"(?i)use[d]?"}{Token.string=~"(?i)orderset[s]?"} | {Token.string=~"$"})
-->
:assessments.Assessments = {}
现在,当评估部分在文档的末尾时,我可以正确地检索笔记。但如果它介于两个部分之间,那么这将从评估部分返回整个文档直到文件结束。
我尝试过以不同的方式使用{Token.string =〜“$”},但无法提取只评估其在DOC中的位置的评估部分。
请解释如何使用JAPE语法实现此目的。
答案 0 :(得分:1)
这是正确的,因为Appelt模式总是更喜欢最长的整体匹配。由于任何令牌都可以与string =~ "$"
匹配,因此assessments
标签会抓取文档中的最后一个令牌。
我会采用两遍方法,使用初始地名词典或JAPE阶段来注释“章节标题”,然后使用其输入行中只有这些标题注释的另一个阶段
Imports: { import static gate.Utils.*; }
Phase: AnnotateBetweenHeadings
Input: Heading
Options: control = appelt
Rule: TwoHeadings
({Heading.type ="assessments"}):h1
(({Heading})?):h2
-->
{
Long endOffset = end(doc);
AnnotationSet h2Annots = bindings.get("h2");
if(h2Annots != null && !h2Annots.isEmpty()) {
endOffset = start(h2Annots);
}
outputAS.add(end(bindings.get("h1")), endOffset, "Assessments", featureMap());
}
这将注释评估标题的结尾与下一个标题的开头之间的所有内容,或者如果没有以下标题则注释文档的结尾。
答案 1 :(得分:0)
Tyson Hamilton提供this alternative来注释EOD,因为$在JAPE中不起作用:
Rule: DOCMARKERS
// we need to match something even though we don't use it directly
(({Token})):doc
-->
:doc{
FeatureMap features = Factory.newFeatureMap();
features.put("rule", ruleName());
try {
outputAS.add(0L, 0L, "SOD", features);
outputAS.add(docAnnots.getDocument().getContent().size(), docAnnots.getDocument().getContent().size(), "EOD", features);
} catch (InvalidOffsetException ioe) {
throw new GateRuntimeException(ioe);
}
}
我发现EOD只是在后来的规则中通过赋予它一定的长度来识别。所以我有这个:
Rule: DOCMARKERS
Priority: 2
(
({Sentence}) // we need to matching something even though we don't use it directly
):doc
-->
:doc{
FeatureMap features = Factory.newFeatureMap();
features.put("rule", "DOCMARKERS");
try {
outputAS.add(0L, 0L, "SOD", features);
long docsize = docAnnots.getDocument().getContent().size();
// The only way I could get EOD to be recognized in later rules was to
// give it some length, hence the -2 and -1
outputAS.add(docsize-2, docsize-1, "EOD", features);
System.err.println("Debug: added EOD");
} catch (InvalidOffsetException ioe) {
throw new GateRuntimeException(ioe);
}
}
然后您应该可以将规则的结尾更改为
...| {Token.string=~"$"})