XPath使用属性和节点解析eCFR XML

时间:2015-08-07 21:45:19

标签: xpath filemaker

这个问题已进行了大量编辑,以使事情更加清晰。

我正在尝试从电子联邦法规XML Feed(http://www.gpo.gov/fdsys/bulkdata/CFR/2015/title-15/CFR-2015-title15-vol2.xml)中提取数据并遇到问题。

具体来说,我想抓取将由Node和Attribute组合匹配的数据。在下面的XML片段中,您可以看到我想要抓取的一些文字。我想获得存在属性FP-2的每个FP节点的数据。我还想获取具有属性FP-1的每个FP节点的数据。



<APPENDIX>
              <EAR>Pt. 774, Supp. 1</EAR>
              <HD SOURCE="HED">Supplement No. 1 to Part 774—The Commerce Control List</HD>
              <HD SOURCE="HD1">Category 0—Nuclear Materials, Facilities, and Equipment [and Miscellaneous Items]</HD>
              <HD SOURCE="HD1">A. “End Items,” “Equipment,” “Accessories,” “Attachments,” “Parts,” “Components,” and “Systems”</HD>
              <FP SOURCE="FP-2">
                <E T="02">0A002Power generating or propulsion equipment “specially designed” for use with space, marine or mobile “nuclear reactors”. (These items are “subject to the ITAR.” See 22 CFR parts 120 through 130.)</E>
              </FP>
              
              <FP SOURCE="FP-2">
                <E T="02">0A018Items on the Wassenaar Munitions List (see List of Items Controlled).</E>
              </FP>
              <FP SOURCE="FP-1">
                <E T="04">License Requirements</E>
              </FP>
              <FP SOURCE="FP-1">
                <E T="03">Reason for Control:</E> NS, AT, UN</FP>
              <GPOTABLE CDEF="s50,r50" COLS="2" OPTS="L2">
                <BOXHD>
                  <CHED H="1">Control(s)</CHED>
                  <CHED H="1">Country Chart (See Supp. No. 1 to part 738)</CHED>
                </BOXHD>
                <ROW>
                  <ENT I="01">NS applies to entire entry</ENT>
                  <ENT>NS Column 1.</ENT>
                </ROW>
                <ROW>
                  <ENT I="01">AT applies to entire entry</ENT>
                  <ENT>AT Column 1.</ENT>
                </ROW>
                <ROW>
                  <ENT I="01">UN applies to entire entry</ENT>
                  <ENT>See § 746.1(b) for UN controls.</ENT>
                </ROW>
              </GPOTABLE>
              <FP SOURCE="FP-1">
                <E T="05">List Based License Exceptions (See Part 740 for a description of all license exceptions)</E>
              </FP>
              <FP SOURCE="FP-1">
                <E T="03">LVS:</E> $3,000 for 0A018.b</FP>
              <FP SOURCE="FP-1">$1,500 for 0A018.c and .d</FP>
              <FP SOURCE="FP-1">
                <E T="03">GBS:</E> N/A</FP>
              <FP SOURCE="FP-1">
                <E T="03">CIV:</E> N/A</FP>
              <FP SOURCE="FP-1">
                <E T="04">List of Items Controlled</E>
              </FP>
              <FP SOURCE="FP-1">
                <E T="03">Related Controls:</E> (1) See also 0A979, 0A988, and 22 CFR 121.1 Categories I(a), III(b-d), and X(a). (2) See ECCN 0A617.y.1 and .y.2 for items formerly controlled by ECCN 0A018.a. (3) See ECCN 1A613.c for military helmets providing less than NIJ Type IV protection and ECCN 1A613.y.1 for conventional military steel helmets that, immediately prior to July 1, 2014, were classified under 0A018.d and 0A988. (4) See 22 CFR 121.1 Category X(a)(5) and (a)(6) for controls on other military helmets.</FP>
              <FP SOURCE="FP-1">
                <E T="03">Related Definitions:</E> N/A</FP>
              <FP>
                <E T="03">Items:</E> a. [Reserved]</FP>
              <P>b. “Specially designed” components and parts for ammunition, except cartridge cases, powder bags, bullets, jackets, cores, shells, projectiles, boosters, fuses and components, primers, and other detonating devices and ammunition belting and linking machines (all of which are “subject to the ITAR.” (See 22 CFR parts 120 through 130);</P>
              <NOTE>
                <HD SOURCE="HED">
                  <E T="03">Note:</E>
                </HD>
                <P>
                  <E T="03">0A018.b does not apply to “components” “specially designed” for blank or dummy ammunition as follows:</E>
                </P>
                <P>
                  <E T="03">a. Ammunition crimped without a projectile (blank star);</E>
                </P>
 </APPENDIX>
&#13;
&#13;
&#13;

为了使问题复杂化,我试图将这些数据提取到Filemaker中,但在编辑时,我会坚持使用简单的XSL。

以下XSL无区别地抓取所有FP节点。

&#13;
&#13;
<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//FP">
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
&#13;
&#13;
&#13;

修改此项以匹配xsl:模板匹配=&#34; FP [@ SOURCE =&#39; FP-1&#39;]允许我根据属性进行必要的匹配,但我&#39;我还不清楚如何捕获我需要的数据。想法?

2 个答案:

答案 0 :(得分:2)

一些事情:

  1. 您的XSLT实际上不是XSLT格式
  2. 在XPath中,要引用属性(即SOURCE),它必须以@为前缀。
  3. 最后,有许多FP1和FP2,但您的设置只选择第一个实例。
  4. 考虑以下XSLT:

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output version="1.0" encoding="UTF-8"/>
    
    <xsl:template match="/">
       <FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
    
        <METADATA>
            <FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
        <FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
        </METADATA>
    
        <RESULTSET>
    
        <xsl:for-each select="//FP[@SOURCE = 'FP-2']/E[@T='02']">
        <ROW>
            <COL>
                <DATA><xsl:value-of select="substring(.,1,5)"/></DATA>
            </COL>
        </ROW>
        </xsl:for-each>    
    
        <xsl:for-each select="//FP[@SOURCE = 'FP-1']/E[@T='02']">
        <ROW>
            <COL>
                <DATA><xsl:value-of select="substring(.,1,5)"/></DATA>
            </COL>
        </ROW>
        </xsl:for-each>        
    
        </RESULTSET>
    </FMPXMLRESULT>
    
    </xsl:template>
    </xsl:stylesheet>
    

    哪个会输出:

    <?xml version='1.0' encoding='UTF-8'?>
    <FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
      <METADATA>
        <FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
        <FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
      </METADATA>
      <RESULTSET>
        <ROW>
          <COL>
            <DATA>0A002</DATA>
          </COL>
        </ROW>
        <ROW>
          <COL>
            <DATA>0A018</DATA>
          </COL>
        </ROW>
      </RESULTSET>
    </FMPXMLRESULT>
    

    完整网络链接xml的部分输出:

    <?xml version='1.0' encoding='UTF-8'?>
    <FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
      <METADATA>
        <FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
        <FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
      </METADATA>
      <RESULTSET>
        <ROW>
          <COL>
            <DATA>2A000</DATA>
          </COL>
        </ROW>
        <ROW>
          <COL>
            <DATA>0A002</DATA>
          </COL>
        </ROW>
        <ROW>
          <COL>
            <DATA>0A018</DATA>
          </COL>
        </ROW>
        <ROW>
          <COL>
            <DATA>0A521</DATA>
          </COL>
        </ROW>
        <ROW>
          <COL>
            <DATA>0A604</DATA>
          </COL>
        </ROW>
        <ROW>
          <COL>
            <DATA>0A606</DATA>
          </COL>
        </ROW>
        ...
    

    实际上,将XSLT处理器指向GPO链接以及所有FP1和FP2输出。我只是用Python做到了!接近3,000行!

答案 1 :(得分:0)

你的问题仍然不明确。如果我专注于这一部分:

  

我想获取属性所在的每个FP节点的数据   FP-2存在。我还想获取每个FP节点的数据   具有属性FP-1。

然后你可能想改变这个:

<xsl:for-each select="//FP">

为:

<xsl:for-each select="//FP[@SOURCE='FP-1' or @SOURCE='FP-2']">

请注意,这会返回每个FP元素的值,其中 SOURCE 属性的是&#39; FP-1&#39;或&#39; FP-2&#39;。我看不到属性FP-2存在的&#34; FP节点&#34;在你的输入中。

另请注意,//语法在处理能力方面很昂贵。如果使用完整的显式路径,您将获得更好的性能。