Question

我有xml文件的sentence.xml，格式如下：

AttributeError: 'list' object has no attribute 'lstrip'

此xml文件显示，例如，我的第一个带有@ id =“sent_1”的句子是从word_1到word_8。第二句（@ id =“sent_2”）是从word_9到word_15等。

我的第二个xml文件verb.xml具有以下格式。

<doc>
   <sentence id="sent_1" span="word_1..word_8"/>
   <sentence id="sent_2" span="word_9..word_15"/>
   <sentence id="sent_3" span="word_16..word_22"/>
   <sentence id="sent_4" span="word_23..word_30"/>
</doc>

这意味着：第一个动词（@ id =“v1”）是“word_3”;第二个动词（@ id =“v2”）是“word_7”等。

如果我们比较两个xml文件，我们会看到，例如verb.xml中的第一个动词（v1）是word_3，属于第一个句子（sent_1）;第三个动词（v3）是word_14属于第二个句子（sent_2）等。

我想要的是输出是比较两个文件的span属性的值，并查看动词所属的句子。例如，word_3位于span word_1..word_8（这是我们的第一句话）的某个地方。输出应如下所示：

<verb id="v1" span="word_3"/>
<verb id="v2" span="word_7"/>
<verb id="v3" span="word_14"/>
<verb id="v4" span="word_27"/>

我希望我的解释清楚。感谢。

Answer 1

您需要从word_1..word_8等范围描述中提取数字，然后从像word_3这样的范围值中提取的数字中引用它们。在XSLT 3中，您可以使用sentence元素的键设置轻松完成此操作：

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="3.0">

  <xsl:param name="sentence-doc">
    <doc>
       <sentence id="sent_1" span="word_1..word_8"/>
       <sentence id="sent_2" span="word_9..word_15"/>
       <sentence id="sent_3" span="word_16..word_22"/>
       <sentence id="sent_4" span="word_23..word_30"/>
    </doc>
  </xsl:param>

  <xsl:key name="ref" match="sentence" 
    use="let $numbers := analyze-string(@span, 'word_([0-9]+)\.\.word_([0-9]+)')//*:group/xs:integer(.)
         return $numbers[1] to $numbers[2]"/>

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:template match="verb">
      <verb id="{@id}" span="{@span}" ref="{key('ref', @span => replace('[^0-9]+', '')=>xs:integer(), $sentence-doc)/@id}"/>
  </xsl:template>

</xsl:stylesheet>

有关在线演示，请参阅https://xsltfiddle.liberty-development.net/3NzcBt2。当然，对于具有两个输入文档的情况，您可以使用<xsl:param name="sentence-doc" select="doc('sentence.xml')"/>而不是包含数据内联，就像我在在线示例中所做的那样。

比较两个xml文件中的值

1 个答案: