XSL转换无法使用Solr中的XPathEntityProcessor返回值

时间:2013-07-26 16:11:58

标签: xml debugging xslt solr dataimporthandler

我正在尝试使用XSL选项和XpathEntityProcessor从xml转换中选择数据。配置运行没有错误,但它不返回任何值。在这一点上,它只是为了让它从下面的 IUPAC实体索引一个单词而成为一个本垒打。所有其他实体都按预期工作。我想使用XSL选项,以便我可以完全访问XPATH以进行特定选择。数据很乱,需要使用javascript或regex进行进一步处理。

我的实体看起来像这样。

    <document>
    <!-- this outer processor generates a list of files satisfying the conditions
         specified in the attributes -->
    <entity 
        name="f" 
        processor="FileListEntityProcessor"
        fileName=".*xml"
        newerThan="'NOW-30YEARS'"
        recursive="true"
        rootEntity="false"
        dataSource="null"
        baseDir="C:\DrugLabels\Prescription\Test"
        transformer="RegexTransformer,TemplateTransformer">

    <!-- this strips the file extension and sets the id = the file name -->     
            <field column="file" regex="^(.*)/|.xml" replaceWith="$1" name="id"/>


    <!-- this processor extracts content using Xpath from each file found -->

        <entity 
        name="DrugLabel" 
        processor="XPathEntityProcessor"
        forEach="/document"
        url="${f.fileAbsolutePath}">

            <entity 
            name="document_title" 
            processor="XPathEntityProcessor"
            transformer="script:lineToTitleCase"
            forEach="/document"
            url="${f.fileAbsolutePath}">

                <field column="title" xpath="/document/title"/>

            </entity>

            <entity 
            name="ingredients" 
            processor="XPathEntityProcessor"
            transformer="script:listToTitleCase"
            forEach="/document"
            url="${f.fileAbsolutePath}">

                <field column="generic_medicine" xpath="/document/component/structuredBody/component/section/subject/manufacturedProduct/manufacturedProduct/asEntityWithGeneric/genericMedicine/name"/></entity>

            </entity>

            <entity 
            name="IUPAC" 
            processor="XPathEntityProcessor"
            transformer="RegexTransformer, script:debug"
            forEach="/add"
            url="${f.fileAbsolutePath}"
            xsl="C:\solr-4.3.1\example\solr\DrugLabels\conf\section-description-transform.xsl">

                <field column="chemical_name" xpath="/add/doc/field/@chemical_name" flatten="true"/>

            </entity>
        </entity>                       
    </entity>
</document>

转换文件:

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:template match="/">
            <add>
                <doc>
                    <xsl:for-each select="/document">
                        <field name="chemical_name"><xsl:value-of select="/document/component/structuredBody/component/section/text/paragraph"/></field>
                    </xsl:for-each>
                </doc>
            </add>
        </xsl:template>
</xsl:stylesheet> 

调试脚本:

function debug(row) {
    var r = row.get('chemical_name').toString;
    r = 'value: ' + r;
    row.put('chemical_name', r);
    return row;
}

输出:

"chemical_name": [
    "value: function toString() {/*\njava.lang.String toString()\n*/}\n"

1 个答案:

答案 0 :(得分:0)

<xsl:template match="/">
    <add>
        <xsl:for-each select="document/entity">
            <doc>
                <xsl:for-each select=".//field">
                    <field name="{@column}"><xsl:value-of select="@xpath"/></field>
                </xsl:for-each>
            </doc>
        </xsl:for-each>
    </add>
</xsl:template>

还有一个similar question也可以提供帮助。