我无法使用Xpath提取节点文本

时间:2018-06-08 13:47:01

标签: xpath azure-data-lake u-sql

我有一个像这样的XML文件(test.xml):

<?xml version="1.0" encoding="ISO-8859-1"?>
<s2xResponse>
  <s2xData>
    <Name>This is the name</Name>
    <InfocomData>
      <DateOfUpdate day="07" month="02" year="2018">20180207</DateOfUpdate>
      <CompanyName>MY COMPANY</CompanyName>
      <TaxCode FlagCheck="0">XXXYYYWWWZZZ</TaxCode>
    </InfocomData>
    <AssessmentSummary>
      <Rating Code="2">Rating Description for Code 2</Rating>
    </AssessmentSummary>
    <AssessmentData>
      <SectorialDistribution>
        <CompaniesNumber>11650</CompaniesNumber>
        <ScoreDistribution />
        <CervedScoreDistribution>
          <DistributionData>
            <Rating Code="1">SICUREZZA</Rating>
            <Percentage>1.91</Percentage>
          </DistributionData>
          <DistributionData>
            <Rating Code="2">SOLVIBILITA' ELEVATA</Rating>
            <Percentage>35.56</Percentage>
          </DistributionData>
        </CervedScoreDistribution>
      </SectorialDistribution>
    </AssessmentData>
  </s2xData>
</s2xResponse>

我试图获得&#34;姓名&#34;节点文本(&#34;这是名称&#34;),使用XmlExtractor的U-SQL脚本。以下是我使用的代码:

USE TestXML; // It contains the registered assembly

REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

@xml = EXTRACT xml_text string
       FROM "textxpath/test.xml"
       USING Extractors.Text(rowDelimiter: "^", quoting: false);

@xml_cleaned =
    SELECT
        xml_text.Replace("\r\n", "").Replace("\t", "    ") AS xml_text
    FROM @xml;

@values =
    SELECT Microsoft.Analytics.Samples.Formats.Xml.XPath.Evaluate(xml_text, "s2xResponse/s2xData/Name")[1] AS value
    FROM @xml_cleaned;


OUTPUT @values TO @"outputs/test_xpath.txt" USING Outputters.Text(quoting: false);

但是我收到了这个运行时错误:

  

执行失败,出现错误&#39; 1_SV1_Extract错误:   &#39; {&#34; diagnosticCode&#34;:195887116&#34;严重性&#34;:&#34;错误&#34;&#34;组分&#34;:&#34; RUNTIME&#34 ;,&#34;源&#34;:&#34;用户&#34;&#34; ErrorID中&#34;:&#34; E_RUNTIME_USER_EXPRESSIONEVALUATION&#34;&#34;消息&#34;:&#34 ;错误   在评估表达时   Microsoft.Analytics.Samples.Formats.Xml.XPath.Evaluate(xml_text.Replace(\&#34; \ r \ n \&#34 ;,   \&#34; \&#34;)。替换(\&#34; \ t \&#34;,\&#34; \&#34;),   \&#34; s2xResponse / s2xData / Name \&#34;)[1]&#34;,&#34; description&#34;:&#34;内部异常来自   用户表达式:索引超出范围。必须是非负的和更少的   而不是集合的大小。

即使我为评估结果([0])使用零索引,我也会得到相同的错误。

我的查询有什么问题?

2 个答案:

答案 0 :(得分:2)

这里的问题是您将下标[1]应用于XPath.Evaluate的结果,我相信这将返回Name个节点。但是,您在代码中应用[1]下标,而不是在XPath中,因此下标可能基于零,而不是基于1,因为它在XPath中,因此Index out of range错误。 / p>

这是一个解决方案 - 只需在Xpath中应用下标运算符(它仍然是从1开始的),然后在那里选择text()

 .Evaluate("s2xResponse/s2xData/Name[1]/text()")

答案 1 :(得分:1)

您是否有特殊原因要使用XmlDomExtractor方法?我使用REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; DECLARE @inputFile string = "/input/input100.xml"; @input = EXTRACT Name string FROM @inputFile USING new Microsoft.Analytics.Samples.Formats.Xml.XmlDomExtractor(rowPath : "/s2xResponse", columnPaths : new SQL.MAP<string, string>{ { "s2xData/Name", "Name" }, } ); @output = SELECT * FROM @input; 让他工作,这将允许您从xml中提取多个值,例如

select columndatatype from sys.syscolumns
  where referenceid = (
    select tableid from sys.systables
    where tablename = 'YOUR_TABEL_NAME'
    and columnname= 'YOUR_COLUMN_NAME')