Question

这是关于这个问题的。

UIMA RUTA - how to do find & replace using regular expression and groups

我正按照建议尝试设置沙发映射。我有一个带有几个AE的聚合AE，并尝试在此管道中包含2个RUTA AEs /脚本。 RUTA AEs（和相关脚本）都负责使用修饰符进行REGEXP查找和替换。第二AE取决于第一AE的输出。我必须配置修改器的第二个AE的输出视图，否则我得到一个沙发数据已经设置＆＃39;例外。

实质上，我无法编织一个输出作为另一个AE的输入。

我的设置与下面类似，

_initialview --Input> (Normalizer1 RUTA AE) --Output> norm_1_out
norm_1_out --Input> (Normalizer2 RUTA AE) --Output> norm_2_out
norm_2_out --Input> (Other AE)

这里是聚合AE代码

<?xml version="1.0" encoding="UTF-8"?>

<analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier">
  <frameworkImplementation>org.apache.uima.java</frameworkImplementation>
  <primitive>false</primitive>
  <delegateAnalysisEngineSpecifiers>
    <delegateAnalysisEngine key="NormalizerPrepStep1">
      <import location="../../../ruta-annotators/desc/NormalizeNumbersEngine.xml"/>
    </delegateAnalysisEngine>

    <delegateAnalysisEngine key="NormalizerPrepStep2">
      <import location="../../../ruta-annotators/desc/NormalizeRangesEngine.xml"/>
    </delegateAnalysisEngine>
    <delegateAnalysisEngine key="Normalizer">
      <import location="../../../ruta-annotators/desc/NormalizerEngine.xml"/>
    </delegateAnalysisEngine>    
    <delegateAnalysisEngine key="SimpleAnnotator">
      <import location="../../../textanalyzer/desc/analysis_engine/SimpleAnnotator.xml"/>
    </delegateAnalysisEngine>
    </delegateAnalysisEngineSpecifiers>
  <analysisEngineMetaData>
    <name>RUTAAggregatePlaintextProcessor</name>
    <description>Runs the complete pipeline for annotating documents in plain text format.</description>
    <version/>
    <vendor/>
    <configurationParameters searchStrategy="language_fallback">
      <configurationParameter>
        <name>SegmentID</name>
        <description/>
        <type>String</type>
        <multiValued>false</multiValued>
        <mandatory>false</mandatory>
        <overrides>
          <parameter>SimpleAnnotator/SegmentID</parameter>
        </overrides>
      </configurationParameter>
    </configurationParameters>
    <configurationParameterSettings/>
    <flowConstraints>
      <fixedFlow>
        <node>NormalizerPrepStep1</node>
        <node>NormalizerPrepStep2</node>
        <node>Normalizer</node>
        <node>SimpleAnnotator</node>
      </fixedFlow>
    </flowConstraints>
    <typePriorities>
      <name>Ordering</name>
      <description>For subiterator</description>
      <version>1.0</version>
      <priorityList>
      </priorityList>
    </typePriorities>
    <fsIndexCollection/>
    <capabilities>
      <capability>
        <inputs/>
        <outputs/>
        <inputSofas>
          <sofaName>norm_1_out</sofaName>
        <sofaName>norm_2_out</sofaName>
          <sofaName>normalized</sofaName>
        </inputSofas>
        <languagesSupported/>
      </capability>
    </capabilities>
    <operationalProperties>
      <modifiesCas>true</modifiesCas>
      <multipleDeploymentAllowed>true</multipleDeploymentAllowed>
      <outputsNewCASes>false</outputsNewCASes>
    </operationalProperties>
  </analysisEngineMetaData>
  <resourceManagerConfiguration/>
<sofaMappings>
    <sofaMapping>
      <componentKey>SimpleAnnotator</componentKey>
      <aggregateSofaName>normalized</aggregateSofaName>
    </sofaMapping>
  <sofaMapping>
      <componentKey>NormalizerPrepStep2</componentKey>
      <aggregateSofaName>norm_1_out</aggregateSofaName>
    </sofaMapping>
    <sofaMapping>
      <componentKey>Normalizer</componentKey>
      <aggregateSofaName>norm_2_out</aggregateSofaName>
    </sofaMapping>
  </sofaMappings>
</analysisEngineDescription>

很少有事情需要注意，

所有三个RUTA AEs（step1，step2，normalizer）使用RUTA Modifier
上面的设置会引发异常＆＃34;没有名为norm_2_out的sofaFS 发现＆＃34。 - 这发生在第2步之后。
我试图切换＆＃39; norm_2_out＆＃39;修改＆＃39;作为输入沙发 normalizer，这似乎将处理移动到管道中的下一步（规范化器），但是会引发异常＆＃34;数据为沙发功能 setLocalSofaData（）已经设置好。＆＃34;在 org.apache.uima.ruta.engine.RutaModifier.process（RutaModifier.java:107）
我尝试过使用相同结果的RUTA 2.2.0（快照）

由于我对UIMA和RUTA都比较新，所以不确定我是做错了什么，或者是否存在我遇到的限制。

BTW，我使用的是RUTA 2.1.0

由于

Answer 1

我在你的例子中注意到的第一件事是你必须在你的AAE中指定输出沙发。这些都是在AAE中创建的沙发，例如，其中一个组件。然后缺少沙发映射。您必须将AE的输出视图与其他AE的输入视图相连接。在您的示例中，我只看到默认的输入视图。

我创建了一个单元测试，可以作为此任务的示例。

测试在这里：https://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/test/java/org/apache/uima/ruta/engine/CascadedModifierTest.java

测试中使用的资源（描述符）位于：https://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/test/resources/org/apache/uima/ruta/engine

请注意，我删除了ruta描述符中的绝对路径，并修改了导入脚本的命名空间。它们现在由类路径加载以进行测试，而不是使用绝对路径。

测试调用聚合分析引擎AAE.xml，它导入并映射五个分析引擎：

CWEngine.xml：简单的Ruta脚本，用于替换大写单词。 CW{->REPLACE("CW")}; CW.ruta
ModiferCW.xml：普通修饰符
SWEngine.xml：简单的Ruta脚本，用于替换小写单词。 SW{->REPLACE("SW")}; SW.ruta
ModiferSW.xml：普通修饰符
SimpleEngine.xml：简单的Ruta脚本，用于定义新类型并匹配“CW”后跟“SW”。 DECLARE CwSw; ("CW" "SW"){-> CwSw}; Simple.ruta

aggreagted分析引擎定义了三个视图：global1（输入），global2（输出）和global3（输出）。组件的沙发映射如下：

global1 - ＆gt; [CWEngine，ModiferCW] - ＆gt; global2 - ＆gt; [SWEngine，ModiferSW] - ＆gt; global3-＆GT; [SimpleEngine]

鉴于视图global1中的文本Peter is tired.，聚合分析引擎会创建两个新视图，其中包含文本CW SW SW.的视图global3和一个类型为Simple.CwSw的注释。

UIMA RUTA - 沙发测绘-in Aggregate Pipeline

1 个答案: