我最近问了这个问题,但意识到我没有很清楚地解释它。 我有一个由发票组成的大型.csv文件(8000多行),每张发票有多行。我正在将其解析为XML结构,如下所示(简化)。
输入1 - $ XMLInput
<?xml version="1.0" encoding="UTF-8"?>
<root>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-1</invoiceText>
<position>1<position>
...
</row>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-2</invoiceText>
<position>2<position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-1</invoiceText>
<position>3<position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-2</invoiceText>
<position>4<position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-1</invoiceText>
<position>5<position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-2</invoiceText>
<position>6<position>
...
</row>
</roow>
输入2 - $ maxBatchSize 描述:在大于此大小(常量)之后,中断到下一批次
输入3 - $ listOfInvoices 描述:文档中唯一发票号的重复变量。例如:
<root>
<row>
<invoiceNumber>1</invoiceNumber>
</row>
<row>
<invoiceNumber>2</invoiceNumber>
</row>
<row>
<invoiceNumber>3</invoiceNumber>
</row>
</root>
为了提高性能时间,我需要将invoiceNumber这些元素分组为不超过每个X节点的批次(要导入的变量)。从那里我将并行地将每个批次发送到子处理器,而不是一次处理整个原始文档。例如,在上面的示例XML文档中,如果批量大小不能大于3,我需要以下XML输出:
输出1 - $ XMLOutput
<root>
<batch>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-1</invoiceText>
<position>1<position>
...
</row>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-2</invoiceText>
<position>2<position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-1</invoiceText>
<position>3<position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-2</invoiceText>
<position>4<position>
...
</row>
</batch>
<batch>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-1</invoiceText>
<position>5<position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-2</invoiceText>
<position>6<position>
...
</row>
</batch>
</root>
要求发票的所有行都在同一批次中发送。我的初始XSLT尝试低于(2.0),我试图模拟一个while循环,通过递归调用模板继续将发票组附加到当前节点。达到最大批量大小时,我递归调用批处理模板以创建新批处理。我在每次递归调用之间传递发票和批处理计数器。
编辑:感谢Ken的帮助,我越来越近了。我确实需要每次按行数打开发票,而不是不同发票的数量。从理论上讲,如果以下内容有效,我不确定如何确保前一个兄弟节点中不存在发票号。<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:bpws="http://schemas.xmlsoap.org/ws/2003/03/business-process/" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<xsl:variable name="batch-size" select="40" as="xs:integer"/>
<xsl:variable name="input" select="bpws:getVariableData('sortedInvoicesByBU')"/>
<xsl:key name="invoice-lines-by-invoice-number" match="row" use="invoiceNumber4z"/>
<xsl:template match="/">
<xsl:element name="batches">
<!--establish batches from possible non-contiguous invoice numbers-->
<xsl:for-each-group select="$input/*:UPSData/*:row" group-by="(position() - 1) idiv $batch-size">
<xsl:for-each select="distinct-values($input/*:UPSData/*:row/*:invoiceNumber4z)[not(.=preceding-sibling::item)]">
<xsl:element name="UPSData">
<xsl:for-each select="current()">
<xsl:for-each select="key('invoice-lines-by-invoice-number',.,$input)">
<!--copy rows as they are-->
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:for-each>
</xsl:element>
</xsl:for-each>
</xsl:for-each-group>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
答案 0 :(得分:4)
我告诉我的学生,人们可以尽可能地折磨样式表以最终使其工作,但这并不能使其可维护甚至是正确的做事方式。我希望你会接受这样一种分析,即你将XSLT视为一种命令式编程语言,它使语言不公正,只会让你相信,尝试做一些在C和Java中更容易的事情是困难的,冗长的和尴尬的
但是如果你按照设计的方式使用XSLT,它就比命令式语言更容易,并且启动它完全基于XML,你可以在其中显示你想要的结果。因为它更短,所以更容易维护。当您理解所使用的声明性指令时,您不必尝试解开命令式算法。并且XSLT处理器可以优化声明性方法,但如果它遵循书面命令方法而没有机会对其进行优化,则它必须缓慢工作。
在下面的解决方案中,它会准确地生成您的Output1结果,请注意我如何确定唯一的发票编号,然后按有效的那些过滤它们。然后我根据批量大小(这是一个参数)批量处理。没有调用模板,没有任何类型的计数器......使用XSLT 2.0的内置工具的解决方案。
不包括全局参数和变量及注释的声明,它只有5个元素长:<root>
,<xsl:for-each-group>
,<batch>
,<xsl:for-each>
和{{ 1}}。
至于你的问题为什么你的工作没有用,我不知道......你采取的方法并不像XSLT那样“感觉”......感觉就像某种程序性命令式方法的XSLT表达。 / p>
<xsl:copy-of>
我正在编辑这个答案以添加下面的备选方案,因为你声明你有800万个输入记录我认为使用键查找表会比我的简单变量谓词表现更好。它通过模板中的一个额外的XSLT指令产生相同的结果(可以在不添加它的情况下完成,但我觉得这更具可读性)并删除不再需要的变量。
t:\ftemp>type numbers.xml
<root>
<row>
<invoiceNumber>1</invoiceNumber>
</row>
<row>
<invoiceNumber>2</invoiceNumber>
</row>
<row>
<invoiceNumber>3</invoiceNumber>
</row>
</root>
t:\ftemp>type invoices.xml
<?xml version="1.0" encoding="UTF-8"?>
<root>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-1</invoiceText>
<position>1</position>
...
</row>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-2</invoiceText>
<position>2</position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-1</invoiceText>
<position>3</position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-2</invoiceText>
<position>4</position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-1</invoiceText>
<position>5</position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-2</invoiceText>
<position>6</position>
...
</row>
</root>
t:\ftemp>call xslt2 invoices.xml invoices.xsl
<?xml version="1.0" encoding="UTF-8"?>
<root>
<batch>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-1</invoiceText>
<position>1</position>
...
</row>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-2</invoiceText>
<position>2</position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-1</invoiceText>
<position>3</position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-2</invoiceText>
<position>4</position>
...
</row>
</batch>
<batch>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-1</invoiceText>
<position>5</position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-2</invoiceText>
<position>6</position>
...
</row>
</batch>
</root>
t:\ftemp>type invoices.xsl
<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output indent="yes"/>
<xsl:param name="batch-size" select="2"/>
<xsl:variable name="valid-numbers"
select="doc('numbers.xml')/root/row/invoiceNumber"/>
<xsl:template match="/">
<xsl:variable name="invoiceLines" select="root/row"/>
<root>
<!--establish batches from possible non-contiguous invoice numbers-->
<xsl:for-each-group group-by="(position() - 1) idiv $batch-size"
select="distinct-values($invoiceLines/invoiceNumber)[.=$valid-numbers]">
<!--create a batch using all invoice lines for all numbers in group-->
<batch>
<xsl:for-each select="$invoiceLines[invoiceNumber=current-group()]">
<!--copy rows as they are-->
<xsl:copy-of select="."/>
</xsl:for-each>
</batch>
</xsl:for-each-group>
</root>
</xsl:template>
</xsl:stylesheet>
t:\ftemp>rem Done!
答案 1 :(得分:0)
请不要将此标记为答案,因为我之前的答案回答了原始问题。
下面的代码回答了如何按发票的总行数进行批处理的辅助问题,而不会破坏两批之间的发票。
我无法想象一种以声明方式执行此操作的方法,因此下面的答案是一个命令式递归解决方案,但编写的是实现尾递归的XSLT处理器不会占用堆栈空间。我还利用了原生的XSLT功能(关键表和序列),这些功能在其他语言中很难模仿。
代码非常紧凑,只有一个部分实际写出了一批发票......没有更多的批处理代码块。我很高兴结果如何。
我欢迎任何改进的建议或比这更紧凑的替代解决方案的帖子。
t:\ftemp>type numbers.xml
<root>
<row>
<invoiceNumber>1</invoiceNumber>
</row>
<row>
<invoiceNumber>2</invoiceNumber>
</row>
<row>
<invoiceNumber>3</invoiceNumber>
</row>
<row>
<invoiceNumber>4</invoiceNumber>
</row>
<row>
<invoiceNumber>5</invoiceNumber>
</row>
</root>
t:\ftemp>type invoices.xml
<?xml version="1.0" encoding="UTF-8"?>
<root>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-1</invoiceText>
<position>1</position>
...
</row>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-2</invoiceText>
<position>2</position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-1</invoiceText>
<position>3</position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-2</invoiceText>
<position>4</position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-1</invoiceText>
<position>5</position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-2</invoiceText>
<position>6</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-1</invoiceText>
<position>7</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-2</invoiceText>
<position>8</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-3</invoiceText>
<position>9</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-4</invoiceText>
<position>10</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-5</invoiceText>
<position>11</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-6</invoiceText>
<position>12</position>
...
</row>
<row>
<invoiceNumber>5</invoiceNumber>
<invoiceText>invoice 5-1</invoiceText>
<position>13</position>
...
</row>
<row>
<invoiceNumber>5</invoiceNumber>
<invoiceText>invoice 5-2</invoiceText>
<position>14</position>
...
</row>
</root>
t:\ftemp>call xslt2 invoices.xml invoices.xsl
<?xml version="1.0" encoding="UTF-8"?>
<root>
<!--Batch max lines: 5-->
<batch>
<!--invoice numbers: 1 2-->
<!--total line count: 4-->
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-1</invoiceText>
<position>1</position>
...
</row>
<row>
<invoiceNumber>1</invoiceNumber>
<invoiceText>invoice 1-2</invoiceText>
<position>2</position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-1</invoiceText>
<position>3</position>
...
</row>
<row>
<invoiceNumber>2</invoiceNumber>
<invoiceText>invoice 2-2</invoiceText>
<position>4</position>
...
</row>
</batch>
<batch>
<!--invoice numbers: 3-->
<!--total line count: 2-->
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-1</invoiceText>
<position>5</position>
...
</row>
<row>
<invoiceNumber>3</invoiceNumber>
<invoiceText>invoice 3-2</invoiceText>
<position>6</position>
...
</row>
</batch>
<batch>
<!--invoice numbers: 4-->
<!--total line count: 6-->
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-1</invoiceText>
<position>7</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-2</invoiceText>
<position>8</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-3</invoiceText>
<position>9</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-4</invoiceText>
<position>10</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-5</invoiceText>
<position>11</position>
...
</row>
<row>
<invoiceNumber>4</invoiceNumber>
<invoiceText>invoice 4-6</invoiceText>
<position>12</position>
...
</row>
</batch>
<batch>
<!--invoice numbers: 5-->
<!--total line count: 2-->
<row>
<invoiceNumber>5</invoiceNumber>
<invoiceText>invoice 5-1</invoiceText>
<position>13</position>
...
</row>
<row>
<invoiceNumber>5</invoiceNumber>
<invoiceText>invoice 5-2</invoiceText>
<position>14</position>
...
</row>
</batch>
</root>
t:\ftemp>type invoices.xsl
<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output indent="yes"/>
<xsl:param name="batch-size" select="5"/>
<xsl:variable name="valid-numbers"
select="doc('numbers.xml')/root/row/invoiceNumber"/>
<xsl:key name="invoice-lines-by-invoice-number"
match="row" use="invoiceNumber"/>
<xsl:variable name="input" select="/"/>
<xsl:template match="/">
<root>
<xsl:text>
 </xsl:text>
<xsl:comment select="'Batch max lines:',$batch-size"/>
<xsl:text>
 </xsl:text>
<xsl:call-template name="next-batch">
<xsl:with-param name="remaining-numbers"
select="distinct-values(root/row/invoiceNumber)[.=$valid-numbers]"/>
</xsl:call-template>
</root>
</xsl:template>
<xsl:template name="next-batch">
<xsl:param name="this-batch-lines" select="0"/>
<xsl:param name="this-batch-numbers" select="()"/>
<xsl:param name="remaining-numbers" required="yes"/>
<xsl:variable name="this-invoice" select="$remaining-numbers[1]"/>
<xsl:variable name="this-invoice-lines"
select="count(key('invoice-lines-by-invoice-number',$this-invoice,$input))"/>
<xsl:choose>
<xsl:when test="not($this-invoice) and not($this-batch-lines)">
<!--nothing to clean up and nothing more to do-->
</xsl:when>
<xsl:when test="not($this-invoice) (:last invoice complete:) or
( $this-batch-lines + $this-invoice-lines > $batch-size )
(:this invoice exceeds limit:)">
<!--clean up previous unfinished batch-->
<batch>
<xsl:text>
 </xsl:text>
<xsl:comment select="'invoice numbers:',$this-batch-numbers"/>
<xsl:text>
 </xsl:text>
<xsl:comment select="'total line count:',$this-batch-lines"/>
<xsl:text>
 </xsl:text>
<xsl:copy-of select="for $num in $this-batch-numbers return
key('invoice-lines-by-invoice-number',$num,$input)"/>
</batch>
<xsl:if test="$this-invoice">
<!--continue with the next batch comprised of this invoice only-->
<xsl:call-template name="next-batch">
<xsl:with-param name="this-batch-lines"
select="$this-invoice-lines"/>
<xsl:with-param name="this-batch-numbers"
select="$this-invoice"/>
<xsl:with-param name="remaining-numbers"
select="$remaining-numbers[position()>1]"/>
</xsl:call-template>
</xsl:if>
<!--the cleaned up batch was the last batch, template recursion ends-->
</xsl:when>
<xsl:otherwise>
<!--a batch limit has not been exceeded; add this invoice to batch-->
<xsl:call-template name="next-batch">
<xsl:with-param name="this-batch-lines"
select="$this-batch-lines + $this-invoice-lines"/>
<xsl:with-param name="this-batch-numbers"
select="($this-batch-numbers,$this-invoice)"/>
<xsl:with-param name="remaining-numbers"
select="$remaining-numbers[position()>1]"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>