使用PHP从XML文档获取特定数据

时间:2016-04-16 11:42:37

标签: php xml xpath xml-parsing

我有一个XML文档。下面给出了代码

<?xml version="1.0" encoding="UTF-8"?>
<nlmSearchResult>
  <term>cancer</term>
  <file>viv_iUjELT</file>
  <server>pvlbsrch16</server>
  <count>341</count>
  <retstart>0</retstart>
  <retmax>10</retmax>
  <list num="341" start="0" per="10">
    <document rank="9" url="https://www.nlm.nih.gov/medlineplus/cancer.html">
      <content name="title">&lt;span class="qt0"&gt;&lt;span class="qt1"&gt;Cancer&lt;/span&gt;&lt;/span&gt;</content>
      <content name="altTitle">&lt;span class="qt0"&gt;&lt;span class="qt1"&gt;Carcinoma&lt;/span&gt;&lt;/span&gt;</content>
      <content name="altTitle">&lt;span class="qt0"&gt;&lt;span class="qt1"&gt;Malignancy&lt;/span&gt;&lt;/span&gt;</content>
      <content name="altTitle">&lt;span class="qt0"&gt;&lt;span class="qt1"&gt;Neoplasms&lt;/span&gt;&lt;/span&gt;</content>
      <content name="altTitle">&lt;span class="qt0"&gt;&lt;span class="qt1"&gt;Oncology&lt;/span&gt;&lt;/span&gt;</content>
      <content name="altTitle">&lt;span class="qt0"&gt;&lt;span class="qt1"&gt;Tumor&lt;/span&gt;&lt;/span&gt;</content>
    </document>
    <document rank="0" url="https://www.nlm.nih.gov/medlineplus/throatcancer.html">
      <content name="title">Throat &lt;span class="qt0"&gt;&lt;span class="qt1"&gt;Cancer&lt;/span&gt;&lt;/span&gt;</content>
      <content name="altTitle">Hypopharyngeal &lt;span class="qt0"&gt;&lt;span class="qt1"&gt;Cancer&lt;/span&gt;&lt;/span&gt;</content>
      <content name="altTitle">Laryngeal &lt;span class="qt0"&gt;&lt;span class="qt1"&gt;Cancer&lt;/span&gt;&lt;/span&gt;</content>
      <content name="altTitle">Laryngopharyngeal &lt;span class="qt0"&gt;&lt;span class="qt1"&gt;Cancer&lt;/span&gt;&lt;/span&gt;</content>
      <content name="altTitle">Nasopharyngeal &lt;span class="qt0"&gt;&lt;span class="qt1"&gt;Cancer&lt;/span&gt;&lt;/span&gt;</content>
      <content name="altTitle">Oropharyngeal &lt;span class="qt0"&gt;&lt;span class="qt1"&gt;Cancer&lt;/span&gt;&lt;/span&gt;</content>
      <content name="altTitle">Pharyngeal &lt;span class="qt0"&gt;&lt;span class="qt1"&gt;Cancer&lt;/span&gt;&lt;/span&gt;</content>
    </document>
    <document rank="1" url="https://www.nlm.nih.gov/medlineplus/intestinalcancer.html">
      <content name="title">Intestinal &lt;span class="qt0"&gt;&lt;span class="qt1"&gt;Cancer&lt;/span&gt;&lt;/span&gt;</content>
      <content name="altTitle">Gastrointestinal Stromal &lt;span class="qt0"&gt;&lt;span class="qt1"&gt;Tumors&lt;/span&gt;&lt;/span&gt;</content>
      <content name="altTitle">Small Intestine &lt;span class="qt0"&gt;&lt;span class="qt1"&gt;Cancer&lt;/span&gt;&lt;/span&gt;</content>
      <content name="altTitle">Duodenal &lt;span class="qt0"&gt;&lt;span class="qt1"&gt;cancer&lt;/span&gt;&lt;/span&gt;</content>
      <content name="altTitle">Ileal &lt;span class="qt0"&gt;&lt;span class="qt1"&gt;cancer&lt;/span&gt;&lt;/span&gt;</content>
      <content name="altTitle">Jejunal &lt;span class="qt0"&gt;&lt;span class="qt1"&gt;cancer&lt;/span&gt;&lt;/span&gt;</content>
    </document>
  </list>
</nlmSearchResult>

我从内容标签中获取名称。在此代码中,有三个文档标签,而文档标签的数量则各不相同。我使用此脚本

获取name =“title”
$myXMLData = file_get_contents('https://wsearch.nlm.nih.gov/ws/query?db=healthTopics&term=cancer');


$xml = new SimpleXMLElement($myXMLData);

//Getting titles
$titles = $xml->xpath("//nlmSearchResult/list/document/content[@name='title']");
$titleCount = count($titles);
for ($j=0; $j < $titleCount; $j++) { 
    echo $xml->xpath("//nlmSearchResult/list/document/content[@name='title']")[$j].'<br />';
}

// Getting alternative titles
$alts = $xml->xpath("//nlmSearchResult/list/document/content[@name='altTitle']");
 $altCount = count($alts);
for ($i=0; $i < $altCount; $i++) { 
  echo $xml->xpath("//nlmSearchResult/list/document/content[@name='altTitle']")[$i].'<br />';
}

现在所有文档都有不同数量的替代标题,因此我无法获得每个文档的特定长度。我现在想要实现的是获取文档标签明智的标题。例如, document [0] - &gt; content [@ name ='altTitle']如何实现这一目标?任何形式的帮助将不胜感激。

0 个答案:

没有答案