有关CYK算法XSLT的任何想法请查看链接 下面:
有两个输入xml如下所示我必须传递sentance.xml xslt然后根据每个sentance中的单词我必须阅读 运行时从Rule.xml文件中生成值,然后生成新的XML 如下所示。
仅使用XSLT,XPath和XML,没有任何其他语言或关键字。
http://en.wikipedia.org/wiki/CYK_algorithm
1)sentance.xml
<?xml version="1.0" encoding="UTF-8"?>
<sentances>
<s>dog bark</s>
<s>cat drink milk</s>
</sentances>
1)sentance.xml
<?xml version="1.0" encoding="UTF-8"?>
<allrules>
<rules>
<rule cat="s">
<rulechild cat="np"/>
<rulechild cat="vp"/>
</rule>
<rule cat="vp">
<rulechild cat="vt"/>
<rulechild cat="np"/>
</rule>
<rule cat="vp">
<rulechild cat="vi"/>
</rule>
</rules>
<words>
<word cat="vi">bark</word>
<word cat="vt">drink</word>
<word cat="pn">dog</word>
<word cat="pn">cat</word>
<word cat="pn">milk</word>
</words>
</allrules>
OutPut XML应如下所示:
<trees>
<tree>
<sentace>dog bark</sentace>
<node cat="s">
<node cat="np">
<word cat="pn">dog</word>
</node>
<node cat="vp">
<word cat="vi">bark</word>
</node>
</node>
</tree>
<tree>
<sentace>cat drink milk</sentace>
<node cat="s">
<node cat="np">
<word cat="pn">cat</word>
</node>
<node cat="vp">
<word cat="vt">drink</word>
<node cat="np">
<word cat="pn">milk</word>
</node>
</node>
</node>
</tree>
是否可以实现CYK算法并使用XSLT产生上述算法 有人请帮忙...
答案 0 :(得分:0)
这是一个解决方案应该接近你所要求的。您没有指定XSLT版本。我在样式表中嵌入了规则和符号,但您可以轻松调整以使它们成为外部文档。
如果您无法使用XSLT 3.0,则可以使用尾端递归替换fold-left()。
此输入文档......
<sentences>
<sentence>dog bark</sentence>
<sentence>cat drink milk</sentence>
</sentences>
...送到这个XSLT 3.0样式表...
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
xmlns:so="http://stackoverflow.com/questions/33340967"
version="3.0"
exclude-result-prefixes="xsl xs fn so">
<xsl:output encoding="utf-8" omit-xml-declaration="yes" indent="yes" />
<xsl:strip-space elements="*" />
<xsl:variable name="rules" as="element(rule)*">
<rule cat="s">
<!-- All rules have precisely 2 children. -->
<rulechild cat="np"/>
<rulechild cat="vp"/>
</rule>
<rule cat="vp">
<rulechild cat="vt"/>
<rulechild cat="np"/>
</rule>
</xsl:variable>
<xsl:variable name="words" as="element(word)+">
<word cat="vi">bark</word>
<word cat="vp">bark</word>
<word cat="vt">drink</word>
<word cat="np">dog</word>
<word cat="np">cat</word>
<word cat="np">milk</word>
</xsl:variable>
<!--
The n'th analysis contains the CYK analysis for symbol sequences of length n.
Let their be s symbols in the sentence.
analysis[1] has s children.
analysis[s] has one child.
analysis[n] has s - n + 1 children
The children of analysis are node and only node.
node element represents a node in CYK analysis. This can either be a word or a string of symbols.
The index of the node within its parent analysis corresponds to the start symbol.
This index is equal to the index of the word within $words, of the starting word.
node has any number of children, but the only type this can be is permutation.
permutation represents a possible value for the node content, the competing alternatives
being all the sibling permutations. Thus if a node has no permutations, there is no
possiblity of a sequence of the given length being a correct grammar at that position
in the sentence.
Each permuation either has as children: 1 word; or 2 nodes.
The permuations in the first row (analysis[1] are all of the word type.
Subsequent rows have permutations of any type.
words and permutations all have an attribute cat, which is the symbol.
-->
<xsl:function name="so:analysis-1" as="element(analysis)">
<!-- Do the first row of CYK. -->
<xsl:param name="sentence" as="xs:string" />
<analysis>
<xsl:analyze-string select="$sentence" regex="\w+">
<xsl:matching-substring>
<xsl:variable name="word" select="." />
<node>
<xsl:for-each select="$words[. eq $word]">
<permutation cat="{@cat}">
<word cat="{@cat}"><xsl:value-of select="$word" /></word>
</permutation>
</xsl:for-each>
</node>
</xsl:matching-substring>
</xsl:analyze-string>
</analysis>
</xsl:function>
<xsl:function name="so:next-analysis" as="element(analysis)">
<!-- Given the first n rows of CYK, compute the n+1'th row. -->
<xsl:param name="rows" as="element(analysis)+" />
<xsl:variable name="word-count" select="count( $rows[1]/node)" as="xs:integer" />
<xsl:variable name="node-count" select="count( $rows[last()]/node) - 1" as="xs:integer" />
<xsl:variable name="seq-len" select="$word-count - $node-count + 1" as="xs:integer" />
<analysis>
<xsl:for-each select="1 to $node-count">
<xsl:variable name="index" select="." as="xs:integer" />
<node>
<xsl:for-each select="$rules">
<xsl:variable name="rule" as="element(rule)" select="." />
<xsl:for-each select="
for $sub-a in 1 to $seq-len - 1 return $sub-a
[$rows[$sub-a ]/node[$index ][permutation/@cat = $rule/rulechild[1]/@cat]]
[$rows[$seq-len - $sub-a]/node[$index + $sub-a][permutation/@cat = $rule/rulechild[2]/@cat]]">
<xsl:variable name="sub-a" select="." as="xs:integer" />
<permutation cat="{$rule/@cat}">
<node>
<xsl:copy-of select="$rows[$sub-a]/node[$index]/permutation[@cat eq $rule/rulechild[1]/@cat]" />
</node>
<node>
<xsl:copy-of select="$rows[$seq-len - $sub-a]/node[$index + $sub-a]/permutation[@cat eq $rule/rulechild[2]/@cat]" />
</node>
</permutation>
</xsl:for-each>
</xsl:for-each>
</node>
</xsl:for-each>
</analysis>
</xsl:function>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates />
</xsl:copy>
</xsl:template>
<xsl:template match="sentences">
<trees>
<xsl:apply-templates />
</trees>
</xsl:template>
<xsl:template match="sentence">
<tree>
<xsl:variable name="first-row" select="so:analysis-1(.)" />
<xsl:variable name="word-count" select="count( $first-row/node)" as="xs:integer" />
<xsl:sequence select="fold-left( 2 to $word-count, $first-row, function($a, $b) { $a, so:next-analysis(a) })
[last()]" />
</tree>
</xsl:template>
</xsl:stylesheet>
...将产生此输出......
<trees>
<tree>
<analysis>
<node>
<permutation cat="s">
<node>
<permutation cat="np">
<word cat="np">dog</word>
</permutation>
</node>
<node>
<permutation cat="vp">
<word cat="vp">bark</word>
</permutation>
</node>
</permutation>
</node>
</analysis>
</tree>
<tree>
<analysis>
<node>
<permutation cat="s">
<node>
<permutation cat="np">
<word cat="np">cat</word>
</permutation>
</node>
<node>
<permutation cat="vp">
<node>
<permutation cat="vt">
<word cat="vt">drink</word>
</permutation>
</node>
<node>
<permutation cat="np">
<word cat="np">milk</word>
</permutation>
</node>
</permutation>
</node>
</permutation>
</node>
</analysis>
</tree>
</trees>
我没有测试过这个。
如果您对所有排列不感兴趣,并且只想要任何(第一个)排列,那么我们可以添加一些模板并删除所有排列,但只有一个。
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
xmlns:so="http://stackoverflow.com/questions/33340967"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
version="3.0"
exclude-result-prefixes="xsl xs fn so">
<xsl:output encoding="utf-8" omit-xml-declaration="yes" indent="yes" />
<xsl:strip-space elements="*" />
<xsl:variable name="rules" as="element(rule)*">
<rule cat="s">
<!-- All rules have precisely 2 children. -->
<rulechild cat="np"/>
<rulechild cat="vp"/>
</rule>
<rule cat="vp">
<rulechild cat="vt"/>
<rulechild cat="np"/>
</rule>
</xsl:variable>
<xsl:variable name="words" as="element(word)+">
<word cat="vi">bark</word>
<word cat="vp">bark</word>
<word cat="vt">drink</word>
<word cat="np">dog</word>
<word cat="np">cat</word>
<word cat="np">milk</word>
</xsl:variable>
<xsl:function name="so:analysis-1" as="element(analysis)">
<!-- Do the first row of CYK. -->
<xsl:param name="sentence" as="xs:string" />
<analysis>
<xsl:analyze-string select="$sentence" regex="\w+">
<xsl:matching-substring>
<xsl:variable name="word" select="." />
<node>
<xsl:for-each select="$words[. eq $word]">
<permutation cat="{@cat}">
<word cat="{@cat}"><xsl:value-of select="$word" /></word>
</permutation>
</xsl:for-each>
</node>
</xsl:matching-substring>
</xsl:analyze-string>
</analysis>
</xsl:function>
<xsl:function name="so:next-analysis" as="element(analysis)">
<!-- Given the first n rows of CYK, compute the n+1'th row. -->
<xsl:param name="rows" as="element(analysis)+" />
<xsl:variable name="word-count" select="count( $rows[1]/node)" as="xs:integer" />
<xsl:variable name="node-count" select="count( $rows[last()]/node) - 1" as="xs:integer" />
<xsl:variable name="seq-len" select="$word-count - $node-count + 1" as="xs:integer" />
<analysis>
<xsl:for-each select="1 to $node-count">
<xsl:variable name="index" select="." as="xs:integer" />
<node>
<xsl:for-each select="$rules">
<xsl:variable name="rule" as="element(rule)" select="." />
<xsl:for-each select="
for $sub-a in 1 to $seq-len - 1 return $sub-a
[$rows[$sub-a ]/node[$index ][permutation/@cat = $rule/rulechild[1]/@cat]]
[$rows[$seq-len - $sub-a]/node[$index + $sub-a][permutation/@cat = $rule/rulechild[2]/@cat]]">
<xsl:variable name="sub-a" select="." as="xs:integer" />
<permutation cat="{$rule/@cat}">
<node>
<xsl:copy-of select="$rows[$sub-a]/node[$index]/permutation[@cat eq $rule/rulechild[1]/@cat]" />
</node>
<node>
<xsl:copy-of select="$rows[$seq-len - $sub-a]/node[$index + $sub-a]/permutation[@cat eq $rule/rulechild[2]/@cat]" />
</node>
</permutation>
</xsl:for-each>
</xsl:for-each>
</node>
</xsl:for-each>
</analysis>
</xsl:function>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates />
</xsl:copy>
</xsl:template>
<xsl:template match="sentences">
<trees>
<xsl:apply-templates />
</trees>
</xsl:template>
<xsl:template match="sentence">
<tree>
<xsl:variable name="first-row" select="so:analysis-1(.)" />
<xsl:apply-templates select="
fold-left(
2 to count( $first-row/node),
$first-row,
function($a, $b) { $a, so:next-analysis(a) })
[last()]" />
</tree>
</xsl:template>
<xsl:template match="analysis">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="node[not( fn:empty(*))]">
<node cat="{permutation[1]/@cat}">
<xsl:apply-templates select="permutation[1]/*"/>
</node>
</xsl:template>
<xsl:template match="node[fn:empty(*)]">
<node xsi:nil="true" />
</xsl:template>
</xsl:stylesheet>
输出应该更整洁,看起来像这样......
<trees xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<tree>
<node cat="s">
<node cat="np">
<word cat="np">dog</word>
</node>
<node cat="vp">
<word cat="vp">bark</word>
</node>
</node>
</tree>
<tree>
<node cat="s">
<node cat="np">
<word cat="np">cat</word>
</node>
<node cat="vp">
<node cat="vt">
<word cat="vt">drink</word>
</node>
<node cat="np">
<word cat="np">milk</word>
</node>
</node>
</node>
</tree>
</trees>