Question

请建议如何仅从选择性文件夹中加快所需数据的提取过程。

目前的编码是检查所有＆＃39; tx1.xml用于提取，其中存在数千个tx1.xml。我们需要从“期刊”中获取＆＃39;在Journals.txt＆＃39;文件。

文件夹结构：

D:\Rudramuni\XSLTPrograms\FilesFetch\Files\AJN\3456\Over\tx1.xml
D:\Rudramuni\XSLTPrograms\FilesFetch\Files\AJN\3457\Over\tx1.xml
D:\Rudramuni\XSLTPrograms\FilesFetch\Files\EB\7654\Over\tx1.xml
D:\Rudramuni\XSLTPrograms\FilesFetch\Files\CLS\1234\Over\tx1.xml <!--Not required because not mentioned in 'Journal.txt'-->

Path.txt

<path>
<a>D:\Rudramuni\XSLTPrograms\FilesFetch\Files</a>
</path>

Journals.txt

<root>
AJN
EB
</root>

输入XML（.. \ AJN \ 3457 \ Over \ tx1.xml）：

<article>
<fm>
    <title>Article One</title>
    <aug><au><fnm>Rudramuni</fnm><snm>TP</snm></au></aug>
</fm>
 </article>

在上面的文件中，脚本只需要找到三个“tx1.xml”，因为在＆＃39; Journal.txt＆＃39; AJN 和 EB 仅提及。

XSLT

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:variable name="varFile" select="document('Path.txt')"/><!--Path of files which given in Path.txt-->
<xsl:variable name="varPath" select="translate($varFile/path/a, '\', '/')"/>

<xsl:variable name="varFile1" select="document('Journals.txt')"/><!--Text file is having Journals name for fetching information-->
<xsl:variable name="varJs"><!--each line of txt file will get tag 'a' -->
    <xsl:for-each select="$varFile1/root/text()">
        <xsl:for-each select="tokenize(., '\n')[normalize-space()]">
            <a><xsl:sequence select="normalize-space()"/></a>
        </xsl:for-each>
    </xsl:for-each>
</xsl:variable>

<xsl:variable name="str1" select="concat('file:///', $varPath,'/?select=tx1.xml;recurse=yes;on-error=ignore')"/>

<xsl:variable name="varFinal">
    <xsl:for-each select="$varJs/a">
        <xsl:variable name="varJName" select="."/>
        <xsl:variable name="varCollection">
            <xsl:copy-of select="collection($str1)
                [matches(document-uri(.), $varJName) and matches(document-uri(.), '[0-9][0-9][0-9][0-9]/Over/tx1.xml')]"/>
        </xsl:variable>
        <fnm><xsl:value-of select="$varCollection//*:fnm"/></fnm><xsl:text>&#10;</xsl:text>
    </xsl:for-each>
</xsl:variable>

<xsl:template match="root">
    <xsl:value-of select="$varFinal"/>
</xsl:template>
</xsl:stylesheet>

必需输出

Rudramuni Kishan
Likhith

XSLT处理器： Saxon9he

Answer 1

查看样式表，您似乎正在加载tx1.xml下的所有D:\Rudramuni\XSLTPrograms\FilesFetch\Files个文件，但您想要的只是那些与此路径相关的文件，但在＆＃34;期刊中提及.TXT＆＃34;

不是创建全局变量以由collection函数加载，而是将其更改为循环或在＆＃34; journals.txt＆＃34;的解析变量上使用apply-templates，即{{ 1}}。您已经开始在$varJs。

中执行此操作

改变这个：

$varFinal

到此：

<xsl:copy-of select="collection($str1)
     [matches(document-uri(.), $varJName) 
     and matches(document-uri(.), '[0-9][0-9][0-9][0-9]/Over/tx1.xml')]"/>

添加以下全局变量（并删除<xsl:copy-of select="collection(f:get-path($varPath, .))" />）：

$str1

添加以下功能：

<xsl:variable name="collection-query" 
    select="'?select=tx1.xml;recurse=yes;on-error=ignore'"/>

删除以下行：

<xsl:function name="f:get-path" as="xs:string">
    <xsl:param name="base" as="xs:string" />
    <xsl:param name="segment" as="xs:string" />
    <xsl:sequence select="concat('file:///', $base, '/', $segment, '/', $collection-query)" />
</xsl:function>

注意，我没有对此进行测试，因为它需要我设置一个完整的目录结构，但沿着这些方向的东西将起作用。另外，在函数中创建URI可以更容易地将其修复为更接近您的要求。

由于您已经在集合uri中的<xsl:variable name="varJName" select="."/>上进行了预选，现在您只能根据实际需要的文件进行选择＆＃34; journals.txt＆＃34;，似乎不需要tx1.xml语句中原始代码中的谓词。

如何跨文件夹中的多个XML加速数据获取过程

1 个答案: