XSL(T)删除重复项到CSV列

时间:2017-11-28 22:23:51

标签: xml csv xslt xpath hashmap

我是XML和XSL(T)的新手,我被要求采用以下XML文件,其中包含教授的出版物信息及其所有出版物和数据。我被要求计算教授每年的出版物数量,并将结果制成CSV格式。

我知道如何使用XPath来获取DTY_PUB(发布日期年份)标记,并将其添加到CSV中,但我很难在多年内不再重复列中的年份。我来自C / C ++背景,我会用一个简单的hashmap来解决这个问题,我不确定如何在XSLT中解决这个问题。以下是XML文件的缩短版本,下面是我开始使用的XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<Data xmlns="http://www.digitalmeasures.com/schema/data" xmlns:dmd="http://www.digitalmeasures.com/schema/data-metadata" dmd:date="2017-10-16">
    <INTELLCONT id="151370213376" dmd:originalSource="MANAGE_DATA" dmd:lastModified="2017-10-03T11:41:47" dmd:startDate="2016-04-15" dmd:endDate="2016-04-15">
        <REFEREED>Yes</REFEREED>
        <CONTYPE>Abstract</CONTYPE>
        <CONTYPEOTHER/>
        <STATUS>Published</STATUS>
        <TITLE>Sample Title</TITLE>
        <TITLE_SECONDARY/>
        <INTELLCONT_AUTH id="151370213379">
            <FACULTY_NAME/>
            <FNAME>FN</FNAME>
            <MNAME/>
            <LNAME>LN</LNAME>
            <INSTITUTION/>
            <ROLE>Author</ROLE>
            <STUDENT_LEVEL/>
        </INTELLCONT_AUTH>
        <INTELLCONT_AUTH id="151370213380">
            <FACULTY_NAME/>
            <FNAME>FN</FNAME>
            <MNAME/>
            <LNAME>LN</LNAME>
            <INSTITUTION/>
            <ROLE>Author</ROLE>
            <STUDENT_LEVEL/>
        </INTELLCONT_AUTH>
        <INTELLCONT_AUTH id="151370213381">
            <FACULTY_NAME/>
            <FNAME>FN</FNAME>
            <MNAME/>
            <LNAME>LN</LNAME>
            <INSTITUTION/>
            <ROLE>Author</ROLE>
            <STUDENT_LEVEL/>
        </INTELLCONT_AUTH>
        <INTELLCONT_AUTH id="151370213382">
            <FACULTY_NAME/>
            <FNAME>FN</FNAME>
            <MNAME/>
            <LNAME>LN</LNAME>
            <INSTITUTION/>
            <ROLE>Author</ROLE>
            <STUDENT_LEVEL/>
        </INTELLCONT_AUTH>
        <INTELLCONT_AUTH id="151370213383">
            <FACULTY_NAME/>
            <FNAME>FN</FNAME>
            <MNAME/>
            <LNAME>LN</LNAME>
            <INSTITUTION/>
            <ROLE>Author</ROLE>
            <STUDENT_LEVEL/>
        </INTELLCONT_AUTH>
        <INTELLCONT_AUTH id="151370213377">
            <FACULTY_NAME>1898739</FACULTY_NAME>
            <FNAME>FN</FNAME>
            <MNAME>MN</MNAME>
            <LNAME>LN</LNAME>
            <INSTITUTION/>
            <ROLE>Author</ROLE>
            <STUDENT_LEVEL/>
        </INTELLCONT_AUTH>
        <PUBLISHER>2016 EPA </PUBLISHER>
        <PUBCTYST>New Orleans, LA</PUBCTYST>
        <PUBCNTRY/>
        <VOLUME/>
        <ISSUE/>
        <PAGENUM>55</PAGENUM>
        <WEB_ADDRESS/>
        <DOI/>
        <ISBNISSN/>
        <PMCID/>
        <AUDIENCE/>
        <PUBLICAVAIL/>
        <ABSTRACT/>
        <FULL_TEXT/>
        <DTM_EXPSUB/>
        <DTD_EXPSUB/>
        <DTY_EXPSUB/>
        <EXPSUB_START></EXPSUB_START>
        <EXPSUB_END></EXPSUB_END>
        <DTM_SUB/>
        <DTD_SUB/>
        <DTY_SUB/>
        <SUB_START></SUB_START>
        <SUB_END></SUB_END>
        <DTM_ACC/>
        <DTD_ACC/>
        <DTY_ACC/>
        <ACC_START></ACC_START>
        <ACC_END></ACC_END>
        <DTM_PUB>April (2nd Quarter/Spring)</DTM_PUB>
        <DTD_PUB>15</DTD_PUB>
        <DTY_PUB>2016</DTY_PUB>
        <PUB_START>2016-04-15</PUB_START>
        <PUB_END>2016-04-15</PUB_END>
        <USER_REFERENCE_CREATOR>Yes</USER_REFERENCE_CREATOR>
    </INTELLCONT>
        <INTELLCONT id="151368284160" dmd:originalSource="MANAGE_DATA" dmd:lastModified="2017-10-03T10:44:48" dmd:startDate="2017-01-01" dmd:endDate="2017-12-31">
            <REFEREED>Yes</REFEREED>
            <CONTYPE>Journal Article</CONTYPE>
            <CONTYPEOTHER/>
            <STATUS>Published</STATUS>
            <TITLE>Sample Title</TITLE>
            <TITLE_SECONDARY/>
            <INTELLCONT_AUTH id="151368284163">
                <FACULTY_NAME/>
                <FNAME>FN</FNAME>
                <MNAME/>
                <LNAME>LN</LNAME>
                <INSTITUTION/>
                <ROLE>Author</ROLE>
                <STUDENT_LEVEL/>
            </INTELLCONT_AUTH>
            <INTELLCONT_AUTH id="151368284164">
                <FACULTY_NAME/>
                <FNAME>FN</FNAME>
                <MNAME/>
                <LNAME>LN</LNAME>
                <INSTITUTION/>
                <ROLE>Author</ROLE>
                <STUDENT_LEVEL/>
            </INTELLCONT_AUTH>
            <INTELLCONT_AUTH id="151368284161">
                <FACULTY_NAME>1898739</FACULTY_NAME>
                <FNAME>FN</FNAME>
                <MNAME>MN</MNAME>
                <LNAME>LN</LNAME>
                <INSTITUTION/>
                <ROLE>Author</ROLE>
                <STUDENT_LEVEL/>
            </INTELLCONT_AUTH>
            <PUBLISHER>Public Health</PUBLISHER>
            <PUBCTYST/>
            <PUBCNTRY/>
            <VOLUME>14</VOLUME>
            <ISSUE>3</ISSUE>
            <PAGENUM>265</PAGENUM>
            <WEB_ADDRESS/>
            <DOI></DOI>
            <ISBNISSN/>
            <PMCID/>
            <AUDIENCE/>
            <PUBLICAVAIL/>
            <ABSTRACT/>
            <FULL_TEXT/>
            <DTM_EXPSUB/>
            <DTD_EXPSUB/>
            <DTY_EXPSUB/>
            <EXPSUB_START></EXPSUB_START>
            <EXPSUB_END></EXPSUB_END>
            <DTM_SUB/>
            <DTD_SUB/>
            <DTY_SUB/>
            <SUB_START></SUB_START>
            <SUB_END></SUB_END>
            <DTM_ACC/>
            <DTD_ACC/>
            <DTY_ACC/>
            <ACC_START></ACC_START>
            <ACC_END></ACC_END>
            <DTM_PUB/>
            <DTD_PUB/>
            <DTY_PUB>2017</DTY_PUB>
            <PUB_START>2017-01-01</PUB_START>
            <PUB_END>2017-12-31</PUB_END>
            <USER_REFERENCE_CREATOR>Yes</USER_REFERENCE_CREATOR>
        </INTELLCONT>
         <INTELLCONT id="151368284160" dmd:originalSource="MANAGE_DATA" dmd:lastModified="2017-10-03T10:44:48" dmd:startDate="2017-01-01" dmd:endDate="2017-12-31">
            <REFEREED>Yes</REFEREED>
            <CONTYPE>Journal Article</CONTYPE>
            <CONTYPEOTHER/>
            <STATUS>Published</STATUS>
            <TITLE>Sample Title</TITLE>
            <TITLE_SECONDARY/>
            <INTELLCONT_AUTH id="151368284163">
                <FACULTY_NAME/>
                <FNAME>FN</FNAME>
                <MNAME/>
                <LNAME>LN</LNAME>
                <INSTITUTION/>
                <ROLE>Author</ROLE>
                <STUDENT_LEVEL/>
            </INTELLCONT_AUTH>
            <INTELLCONT_AUTH id="151368284164">
                <FACULTY_NAME/>
                <FNAME>FN</FNAME>
                <MNAME/>
                <LNAME>LN</LNAME>
                <INSTITUTION/>
                <ROLE>Author</ROLE>
                <STUDENT_LEVEL/>
            </INTELLCONT_AUTH>
            <INTELLCONT_AUTH id="151368284161">
                <FACULTY_NAME>1898739</FACULTY_NAME>
                <FNAME>FN</FNAME>
                <MNAME>MN</MNAME>
                <LNAME>LN</LNAME>
                <INSTITUTION/>
                <ROLE>Author</ROLE>
                <STUDENT_LEVEL/>
            </INTELLCONT_AUTH>
            <PUBLISHER>Public Health</PUBLISHER>
            <PUBCTYST/>
            <PUBCNTRY/>
            <VOLUME>14</VOLUME>
            <ISSUE>3</ISSUE>
            <PAGENUM>265</PAGENUM>
            <WEB_ADDRESS/>
            <DOI></DOI>
            <ISBNISSN/>
            <PMCID/>
            <AUDIENCE/>
            <PUBLICAVAIL/>
            <ABSTRACT/>
            <FULL_TEXT/>
            <DTM_EXPSUB/>
            <DTD_EXPSUB/>
            <DTY_EXPSUB/>
            <EXPSUB_START></EXPSUB_START>
            <EXPSUB_END></EXPSUB_END>
            <DTM_SUB/>
            <DTD_SUB/>
            <DTY_SUB/>
            <SUB_START></SUB_START>
            <SUB_END></SUB_END>
            <DTM_ACC/>
            <DTD_ACC/>
            <DTY_ACC/>
            <ACC_START></ACC_START>
            <ACC_END></ACC_END>
            <DTM_PUB/>
            <DTD_PUB/>
            <DTY_PUB>2017</DTY_PUB>
            <PUB_START>2017-01-01</PUB_START>
            <PUB_END>2017-12-31</PUB_END>
            <USER_REFERENCE_CREATOR>Yes</USER_REFERENCE_CREATOR>
        </INTELLCONT>
</Data>

这是XSLT

<xsl:output method="text" encoding="utf-8"/>
<xsl:variable name="delimiter" select="','"/>
<!-- xmlns:dm is the xmlns attribute in Data.-->


<xsl:key name="Year-Published" match="dm:INTELLCONT" use="dm:DTY_PUB"/>
<xsl:template match="/dm:Data">


 <xsl:text>Year,</xsl:text>
    <xsl:for-each select="dm:Record/dm:INTELLCONT">
        <xsl:value-of select="dm:DTY_PUB"/> <xsl:text>,</xsl:text>
    </xsl:for-each> 

    <!-- output newline -->
    <xsl:text>&#xa;</xsl:text>

</xsl:template>

到目前为止的输出是:

Year,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016

但是,我希望它是:

Year,2017,2016

这需要是版本1,我知道V2有for-each-groups可以相当容易地处理这个问题,而且我无法找出Muenchian分组,如果有人可以解释如何这样做我会这样做永远感激。提前致谢

1 个答案:

答案 0 :(得分:0)

让它工作,我在正确地将名称插入我的密钥时没有使用单引号:

&#13;
&#13;
<xsl:output method="text" encoding="utf-8"/>
<xsl:variable name="delimiter" select="','"/>
<!-- xmlns:dm is the xmlns attribute in Data.-->


<xsl:key name="Year-Published" match="dm:INTELLCONT" use="dm:DTY_PUB"/>
<xsl:template match="/dm:Data">


  <xsl:text>Year,</xsl:text>
		<xsl:for-each select="dm:Record/dm:INTELLCONT[generate-id()=generate-id(key('Year-Published', dm:DTY_PUB)[1])]">
			<xsl:sort select="(dm:DTY_PUB)" order="ascending"/> 
			<xsl:value-of select="(dm:DTY_PUB)"/> <xsl:text>,</xsl:text>
		</xsl:for-each> 
 <xsl:text>&#xa;</xsl:text>

</xsl:template>
&#13;
&#13;
&#13;