XSLT中的Cocke-Younger-Kasami(CYK)算法并转换XML以生成新的XML文件

时间:2015-10-26 07:59:38

标签: xml xslt xpath

  

有关CYK算法XSLT的任何想法请查看链接   下面:

     

有两个输入xml如下所示我必须传递sentance.xml   xslt然后根据每个sentance中的单词我必须阅读   运行时从Rule.xml文件中生成值,然后生成新的XML   如下所示。

     

仅使用XSLT,XPath和XML,没有任何其他语言或关键字。

http://en.wikipedia.org/wiki/CYK_algorithm

  

1)sentance.xml

<?xml version="1.0" encoding="UTF-8"?>
<sentances>
  <s>dog bark</s>
  <s>cat drink milk</s>
</sentances>
  

1)sentance.xml

<?xml version="1.0" encoding="UTF-8"?>
<allrules>
<rules>
    <rule cat="s">
        <rulechild cat="np"/>
        <rulechild cat="vp"/>
    </rule>
    <rule cat="vp">
        <rulechild cat="vt"/>
        <rulechild cat="np"/>
    </rule>
    <rule cat="vp">
        <rulechild cat="vi"/>
    </rule> 
</rules>
<words>
    <word cat="vi">bark</word>
    <word cat="vt">drink</word>
    <word cat="pn">dog</word>
    <word cat="pn">cat</word>
    <word cat="pn">milk</word>
</words>
</allrules>
  

OutPut XML应如下所示:

<trees>
<tree>
    <sentace>dog bark</sentace>
    <node cat="s">
        <node cat="np">
            <word cat="pn">dog</word>
        </node>
        <node cat="vp">
            <word cat="vi">bark</word>
        </node>
    </node>
</tree>
<tree>
    <sentace>cat drink milk</sentace>
    <node cat="s">
        <node cat="np">
            <word cat="pn">cat</word>
        </node>
        <node cat="vp">
            <word cat="vt">drink</word>
            <node cat="np">
                <word cat="pn">milk</word>
            </node>
        </node>
    </node>
</tree>

  

是否可以实现CYK算法并使用XSLT产生上述算法   有人请帮忙...

1 个答案:

答案 0 :(得分:0)

这是一个解决方案应该接近你所要求的。您没有指定XSLT版本。我在样式表中嵌入了规则和符号,但您可以轻松调整以使它们成为外部文档。

如果您无法使用XSLT 3.0,则可以使用尾端递归替换fold-left()。

此输入文档......

<sentences>
  <sentence>dog bark</sentence>
  <sentence>cat drink milk</sentence>
</sentences>

...送到这个XSLT 3.0样式表...

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:fn="http://www.w3.org/2005/xpath-functions"
  xmlns:so="http://stackoverflow.com/questions/33340967"
  version="3.0"
  exclude-result-prefixes="xsl xs fn so">

<xsl:output encoding="utf-8" omit-xml-declaration="yes" indent="yes" />
<xsl:strip-space elements="*" />

<xsl:variable name="rules" as="element(rule)*">
    <rule cat="s">
        <!-- All rules have precisely 2 children. -->
        <rulechild cat="np"/>
        <rulechild cat="vp"/>
    </rule>
    <rule cat="vp">
        <rulechild cat="vt"/>
        <rulechild cat="np"/>
    </rule>
</xsl:variable>

<xsl:variable name="words" as="element(word)+">
    <word cat="vi">bark</word>
    <word cat="vp">bark</word>
    <word cat="vt">drink</word>
    <word cat="np">dog</word>
    <word cat="np">cat</word>
    <word cat="np">milk</word>
</xsl:variable>

<!--
  The n'th analysis contains the CYK analysis for symbol sequences of length n.
  Let their be s symbols in the sentence.
  analysis[1] has s children.
  analysis[s] has one child.
  analysis[n] has s - n + 1 children
  The children of analysis are node and only node.
  node element represents a node in CYK analysis. This can either be a word or a string of symbols.
  The index of the node within its parent analysis corresponds to the start symbol.
    This index is equal to the index of the word within $words, of the starting word.
  node has any number of children, but the only type this can be is permutation.
  permutation represents a possible value for the node content, the competing alternatives
   being all the sibling permutations. Thus if a node has no permutations, there is no
   possiblity of a sequence of the given length being a correct grammar at that position
   in the sentence.
  Each permuation either has as children: 1 word; or 2 nodes.
  The permuations in the first row (analysis[1] are all of the word type.
  Subsequent rows have permutations of any type.
  words and permutations all have an attribute cat, which is the symbol.  
-->

<xsl:function name="so:analysis-1" as="element(analysis)"> 
  <!-- Do the first row of CYK. -->
  <xsl:param name="sentence" as="xs:string" />
  <analysis>
    <xsl:analyze-string select="$sentence" regex="\w+">
      <xsl:matching-substring>
        <xsl:variable name="word" select="." />
        <node>
          <xsl:for-each select="$words[. eq $word]">
            <permutation cat="{@cat}"> 
              <word cat="{@cat}"><xsl:value-of select="$word" /></word>
            </permutation>
          </xsl:for-each>
        </node>
      </xsl:matching-substring>
    </xsl:analyze-string>
  </analysis>
</xsl:function>

<xsl:function name="so:next-analysis" as="element(analysis)"> 
  <!-- Given the first n rows of CYK, compute the n+1'th row. -->
  <xsl:param name="rows" as="element(analysis)+" />
  <xsl:variable name="word-count" select="count( $rows[1]/node)" as="xs:integer" />
  <xsl:variable name="node-count" select="count( $rows[last()]/node) - 1" as="xs:integer" />
  <xsl:variable name="seq-len"    select="$word-count - $node-count + 1" as="xs:integer" />
  <analysis>
    <xsl:for-each select="1 to $node-count">
      <xsl:variable name="index" select="." as="xs:integer" />
      <node>
        <xsl:for-each select="$rules">
          <xsl:variable name="rule" as="element(rule)" select="." />
          <xsl:for-each select="
            for $sub-a in 1 to $seq-len - 1 return $sub-a
                [$rows[$sub-a           ]/node[$index         ][permutation/@cat = $rule/rulechild[1]/@cat]]
                [$rows[$seq-len - $sub-a]/node[$index + $sub-a][permutation/@cat = $rule/rulechild[2]/@cat]]">            
            <xsl:variable name="sub-a"    select="." as="xs:integer" />
            <permutation cat="{$rule/@cat}">
              <node>
                <xsl:copy-of select="$rows[$sub-a]/node[$index]/permutation[@cat eq $rule/rulechild[1]/@cat]" />
              </node>
              <node>
                <xsl:copy-of select="$rows[$seq-len - $sub-a]/node[$index + $sub-a]/permutation[@cat eq $rule/rulechild[2]/@cat]" />
              </node>
            </permutation>
          </xsl:for-each>     
        </xsl:for-each>   
      </node>
    </xsl:for-each>
  </analysis>
</xsl:function>

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates />
  </xsl:copy>
</xsl:template>

<xsl:template match="sentences">
  <trees>
    <xsl:apply-templates />
  </trees>
</xsl:template>

<xsl:template match="sentence">
  <tree>
    <xsl:variable name="first-row"  select="so:analysis-1(.)" />
    <xsl:variable name="word-count" select="count( $first-row/node)" as="xs:integer" />
    <xsl:sequence select="fold-left( 2 to $word-count, $first-row, function($a, $b) { $a, so:next-analysis(a) })
      [last()]" />
  </tree>
</xsl:template>

</xsl:stylesheet>

...将产生此输出......

<trees>
   <tree>
      <analysis>
         <node>
            <permutation cat="s">
               <node>
                  <permutation cat="np">
                     <word cat="np">dog</word>
                  </permutation>
               </node>
               <node>
                  <permutation cat="vp">
                     <word cat="vp">bark</word>
                  </permutation>
               </node>
            </permutation>
         </node>
      </analysis>
   </tree>
   <tree>
      <analysis>
         <node>
            <permutation cat="s">
               <node>
                  <permutation cat="np">
                     <word cat="np">cat</word>
                  </permutation>
               </node>
               <node>
                  <permutation cat="vp">
                     <node>
                        <permutation cat="vt">
                           <word cat="vt">drink</word>
                        </permutation>
                     </node>
                     <node>
                        <permutation cat="np">
                           <word cat="np">milk</word>
                        </permutation>
                     </node>
                  </permutation>
               </node>
            </permutation>
         </node>
      </analysis>
   </tree>   
</trees>

警告Emptor

我没有测试过这个。

替代

如果您对所有排列不感兴趣,并且只想要任何(第一个)排列,那么我们可以添加一些模板并删除所有排列,但只有一个。

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:fn="http://www.w3.org/2005/xpath-functions"
  xmlns:so="http://stackoverflow.com/questions/33340967"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  version="3.0"
  exclude-result-prefixes="xsl xs fn so">

<xsl:output encoding="utf-8" omit-xml-declaration="yes" indent="yes" />
<xsl:strip-space elements="*" />

<xsl:variable name="rules" as="element(rule)*">
    <rule cat="s">
        <!-- All rules have precisely 2 children. -->
        <rulechild cat="np"/>
        <rulechild cat="vp"/>
    </rule>
    <rule cat="vp">
        <rulechild cat="vt"/>
        <rulechild cat="np"/>
    </rule>
</xsl:variable>

<xsl:variable name="words" as="element(word)+">
    <word cat="vi">bark</word>
    <word cat="vp">bark</word>
    <word cat="vt">drink</word>
    <word cat="np">dog</word>
    <word cat="np">cat</word>
    <word cat="np">milk</word>
</xsl:variable>

<xsl:function name="so:analysis-1" as="element(analysis)"> 
  <!-- Do the first row of CYK. -->
  <xsl:param name="sentence" as="xs:string" />
  <analysis>
    <xsl:analyze-string select="$sentence" regex="\w+">
      <xsl:matching-substring>
        <xsl:variable name="word" select="." />
        <node>
          <xsl:for-each select="$words[. eq $word]">
            <permutation cat="{@cat}"> 
              <word cat="{@cat}"><xsl:value-of select="$word" /></word>
            </permutation>
          </xsl:for-each>
        </node>
      </xsl:matching-substring>
    </xsl:analyze-string>
  </analysis>
</xsl:function>

<xsl:function name="so:next-analysis" as="element(analysis)"> 
  <!-- Given the first n rows of CYK, compute the n+1'th row. -->
  <xsl:param name="rows" as="element(analysis)+" />
  <xsl:variable name="word-count" select="count( $rows[1]/node)" as="xs:integer" />
  <xsl:variable name="node-count" select="count( $rows[last()]/node) - 1" as="xs:integer" />
  <xsl:variable name="seq-len"    select="$word-count - $node-count + 1" as="xs:integer" />
  <analysis>
    <xsl:for-each select="1 to $node-count">
      <xsl:variable name="index" select="." as="xs:integer" />
      <node>
        <xsl:for-each select="$rules">
          <xsl:variable name="rule" as="element(rule)" select="." />
          <xsl:for-each select="
            for $sub-a in 1 to $seq-len - 1 return $sub-a
                [$rows[$sub-a           ]/node[$index         ][permutation/@cat = $rule/rulechild[1]/@cat]]
                [$rows[$seq-len - $sub-a]/node[$index + $sub-a][permutation/@cat = $rule/rulechild[2]/@cat]]">            
            <xsl:variable name="sub-a"    select="." as="xs:integer" />
            <permutation cat="{$rule/@cat}">
              <node>
                <xsl:copy-of select="$rows[$sub-a]/node[$index]/permutation[@cat eq $rule/rulechild[1]/@cat]" />
              </node>
              <node>
                <xsl:copy-of select="$rows[$seq-len - $sub-a]/node[$index + $sub-a]/permutation[@cat eq $rule/rulechild[2]/@cat]" />
              </node>
            </permutation>
          </xsl:for-each>     
        </xsl:for-each>   
      </node>
    </xsl:for-each>
  </analysis>
</xsl:function>

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates />
  </xsl:copy>
</xsl:template>

<xsl:template match="sentences">
  <trees>
    <xsl:apply-templates />
  </trees>
</xsl:template>

<xsl:template match="sentence">
  <tree>
    <xsl:variable name="first-row"  select="so:analysis-1(.)" />
    <xsl:apply-templates select="
       fold-left(
          2 to count( $first-row/node),
          $first-row,
          function($a, $b) { $a, so:next-analysis(a) })
       [last()]" />
  </tree>
</xsl:template>

<xsl:template match="analysis">
  <xsl:apply-templates />
</xsl:template>

<xsl:template match="node[not( fn:empty(*))]">
    <node cat="{permutation[1]/@cat}">
        <xsl:apply-templates select="permutation[1]/*"/>
    </node>
</xsl:template>

<xsl:template match="node[fn:empty(*)]">
    <node xsi:nil="true" />
</xsl:template>

</xsl:stylesheet>

替代输出

输出应该更整洁,看起来像这样......

<trees xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <tree>
      <node cat="s">
         <node cat="np">
            <word cat="np">dog</word>
         </node>
         <node cat="vp">
            <word cat="vp">bark</word>
         </node>
      </node>
   </tree>
   <tree>
      <node cat="s">
         <node cat="np">
            <word cat="np">cat</word>
         </node>
         <node cat="vp">
            <node cat="vt">
               <word cat="vt">drink</word>
            </node>
            <node cat="np">
               <word cat="np">milk</word>
            </node>
         </node>
      </node>
   </tree>   
</trees>