XML输出所需的Xpath帮助

时间:2018-06-26 10:13:50

标签: xml xpath

我尝试使用Xpath获取DataTable标头。

我的输出应该是:

  

ItemNum |项目|的ResultCode |状态| ExtBackLinks | RefDomains | AnalysisResUnitsCost | ACRank |的ItemType | IndexedURLs | GetTopBackLinksAnalysisResUnitsCost | DownloadBacklinksAnalysisResUnitsCost | DownloadRefDomainBacklinksAnalysisResUnitsCost | RefIPs | RefSubNets | RefDomainsEDU | ExtBackLinksEDU | RefDomainsGOV | ExtBackLinksGOV | RefDomainsEDU_Exact | ExtBackLinksEDU_Exact | RefDomainsGOV_Exact | ExtBackLinksGOV_Exact | CrawledFlag | LastCrawlDate | LastCrawlResult | RedirectFlag | FinalRedirectResult | OutDomainsExternal | OutLinksExternal | OutLinksInternal | OutLinksPages | LastSeen |标题| RedirectTo |语言LanguageDesc | LanguageConfidence | LanguagePageRatios | LanguageTotalPages | RefLanguage | RefLanguageDesc | RefLanguageConfidence | RefLanguagePageRatios | RefLanguageTotalPages | CrawledURLs | RootDomainIPAddress | TotalNonUniqueLinks | NonUniqueLinkTypeHomepages | NonUniqueLinkTypeIndirect | NonUniqueLinkTypeDeleted | NonUniqueLinkTypeNoFollow | NonUniqueLinkTypeProtocolHTTPS | NonUniqueLinkTypeFrame | NonUniqueLinkTypeImageLink | NonUniqueLinkTypeRedirect | NonUni queLinkTypeTextLink | RefDomainTypeLive | RefDomainTypeFollow | RefDomainTypeHomepageLink | RefDomainTypeDirect | RefDomainTypeProtocolHTTPS | CitationFlow | TrustFlow | TrustMetric | TopicalTrustFlow_Topic_0 | TopicalTrustFlow_Value_0 | TopicalTrustFlow_Topic_1 | TopicalTrust_Flow_Value_1 | TopicalTrustFlow_Value_1

这是原始XML:

<Result Code="OK" ErrorMessage="" FullError="">
<GlobalVars FirstBackLinkDate="2012-09-21" IndexBuildDate="2018-05-24 19:47:18" IndexType="0" MostRecentBackLinkDate="2018-04-23" QueriedRootDomains="1" QueriedSubDomains="0" QueriedURLs="0" QueriedURLsMayExist="0" ServerBuild="2018-06-11 13:52:01" ServerName="BRUNO28" ServerVersion="1.0.6736.23160" UniqueIndexID="20180524194718-HISTORICAL"/>
<DataTables Count="1">
<DataTable Name="Results" RowsCount="1" Headers="ItemNum|Item|ResultCode|Status|ExtBackLinks|RefDomains|AnalysisResUnitsCost|ACRank|ItemType|IndexedURLs|GetTopBackLinksAnalysisResUnitsCost|DownloadBacklinksAnalysisResUnitsCost|DownloadRefDomainBacklinksAnalysisResUnitsCost|RefIPs|RefSubNets|RefDomainsEDU|ExtBackLinksEDU|RefDomainsGOV|ExtBackLinksGOV|RefDomainsEDU_Exact|ExtBackLinksEDU_Exact|RefDomainsGOV_Exact|ExtBackLinksGOV_Exact|CrawledFlag|LastCrawlDate|LastCrawlResult|RedirectFlag|FinalRedirectResult|OutDomainsExternal|OutLinksExternal|OutLinksInternal|OutLinksPages|LastSeen|Title|RedirectTo|Language|LanguageDesc|LanguageConfidence|LanguagePageRatios|LanguageTotalPages|RefLanguage|RefLanguageDesc|RefLanguageConfidence|RefLanguagePageRatios|RefLanguageTotalPages|CrawledURLs|RootDomainIPAddress|TotalNonUniqueLinks|NonUniqueLinkTypeHomepages|NonUniqueLinkTypeIndirect|NonUniqueLinkTypeDeleted|NonUniqueLinkTypeNoFollow|NonUniqueLinkTypeProtocolHTTPS|NonUniqueLinkTypeFrame|NonUniqueLinkTypeImageLink|NonUniqueLinkTypeRedirect|NonUniqueLinkTypeTextLink|RefDomainTypeLive|RefDomainTypeFollow|RefDomainTypeHomepageLink|RefDomainTypeDirect|RefDomainTypeProtocolHTTPS|CitationFlow|TrustFlow|TrustMetric|TopicalTrustFlow_Topic_0|TopicalTrustFlow_Value_0|TopicalTrustFlow_Topic_1|TopicalTrustFlow_Value_1|TopicalTrustFlow_Topic_2|TopicalTrustFlow_Value_2" MaxTopicsRootDomain="30" MaxTopicsSubDomain="20" MaxTopicsURL="10" TopicsCount="3">
<Row>
0|nu.nl|OK|Found|508322106|165344|508322106|-1|1|4149991|5000|512472097|3356880|59147|26204|233|3613|43|308|73|1757|4|12|False| | |True| |5|10|44|1722150| |NU - Het laatste nieuws het eerst op NU.nl|https://www.nu.nl/|nl|Dutch/Flemish|92|99.9|482980|nl,en,de|Dutch/Flemish,English,German|87,93,58|96.5,3.1,0.1|76319583|1915923|52.85.201.19|611833777|15034990|53120677|444371798|95283418|52384870|388104|53497551|5655999|552292123|102171|115787|21952|150164|49554|76|70|70|News/Breaking News|69|Sports/Resources|45|Arts/Radio|43
</Row>
</DataTable>
</DataTables>
</Result>

当我在 Google表格中使用此Xpath命令时:

=importxml("http://enterprise.majesticseo.com/api_command?privatekey=xxx&accessToken=xxx&cmd=GetIndexItemInfo&item0=nu.nl&items=1","//DataTable"

我得到行结果。很棒,但是我还需要在工作表的第一行中添加标题名称。

1 个答案:

答案 0 :(得分:3)

XPath简介:-)

使用//DataTable,您将获得XML中任何位置的任何<DataTable>(此处不涉及名称空间)的完整节点。
根据经验,最好尽可能具体一些(而不是使用/Result/DataTables/DataTable)。但这不是您问题的答案...

想象一下这样的XML:

<root>
  <innerNode attr="1"><a>Some a content</a><b>Some b content</b></innerNode>
  <innerNode attr="2"><a>aaa</a><b>bbb</b></innerNode>
</root>

使用/root/innerNode,您将同时获得<innerNode>和所有内容。

使用/root/innerNode[(b/text())[1]="bbb"]只会得到一个<innerNode>,其中<b>的{​​{1}}是text()

使用"bbb",您将得到一个/root/innerNode[@attr="1"],其中属性<innerNode>的值为“ 2”。

所有三个attr样本都带回整个节点,包括子节点,属性等等。

如果仅需要属性的值,则必须要求它:

XPath

...返回第二个(/root/innerNode/@attr)[2] (实际上是第二次出现)的属性值

<innerNode>

...返回/root/innerNode[(b/text())[1]="Some b content"]/@attr 的属性值,其中<innerNode>的值为<b> 0f text()

回到您的问题

您想读取位于"Some b content"的元素Headers中的属性<DataTable>。只需使用

/Result/DataTables