使用PHP和DOM如何从以下代码(网页的一部分)获取PLACE,ADDRESS,LOCALITY,REGION,POSTAL CODE和COUNTRY。
从现在开始,我已经开发了一部分代码来获取其他内容。这是到目前为止的代码。
$dochtml = new DOMDocument();
$dochtml->loadHTMLfile('');
$xpath = new DOMXpath($dochtml);
$descr = $xpath->query('//div[@class="description"]')->item(0);
print_r($descr->nodeValue);
$abbr = $dochtml->getElementsByTagName("abbr")->item(0);
$title = $abbr->getAttribute("title");
echo $title;
这是代码的其余部分。
<div class="vcard location p">
<div class="fn org">
<a href="link here">PLACE</a>
</div>
<div class="adr">
<div class="street-address">ADDRESS<br></div>
<div>
<span class="locality">LOCALITY</span>,
<span class="region">REGION</span>
<span class="postal-code">POSTAL CODE</span>,
<span class="country-name">COUNTRY</span>
</div>
</div>
</div>
更新
我对以下内容存在一个小问题,在页面中有很多<abbr>
代码,但我想要的两个代码dtstart
和dtend
如下所示在#eventDetailInfo
内。遗憾的是,并非所有标记都包含abbr
的{{1}}标记,因此它会从“相关事件”中获得第一个标记。所以我的问题是如何将其仅限于此特定ID?
class=dtend
答案 0 :(得分:3)
通过阅读DOMXPath
documentation,我建议的解决方案概述如下。
按类别获取元素
$nodes = $xpath->query('//div[contains(@class, "street-address")]');
按ID获取元素
$node = $xpath->query('//div[@id="someid"]');
<强>解决方案强>
要提取您的值,您可以使用类似(working example)的内容:
<?php
$html = '<div class="vcard location p">
<div class="fn org">
<a href="link here">PLACE</a>
</div>
<div class="adr">
<div class="street-address">ADDRESS<br></div>
<div>
<span class="locality">LOCALITY</span>,
<span class="region">REGION</span>
<span class="postal-code">POSTAL CODE</span>,
<span class="country-name">COUNTRY</span>
</div>
</div>
<div id="eventDetailInfo">
<div class="p">
<div><abbr class="dtstart" title="2012-07-16T21:00:00">Monday, July 16th, 2012</abbr></div>
<div><abbr class="dtend" title="2012-08-16T21:00:00">Monday, August 16th, 2012</abbr></div>
</div>
</div>
</div>';
$document = new DOMDocument();
$document->loadHTML($html);
$xPath = new DOMXpath($document);
function extractNodeValue($query, $xPath, $attribute = null) {
$node = $xPath->query("//{$query}")->item(0);
if (!$node) {
return null;
}
return $attribute ? $node->getAttribute($attribute) : $node->nodeValue;
}
$place = extractNodeValue('div[contains(@class, "fn")]/a', $xPath);
$address = extractNodeValue('div[contains(@class, "street-address")]',$xPath);
$locality = extractNodeValue('span[contains(@class, "locality")]',$xPath);
$region = extractNodeValue('span[contains(@class, "region")]', $xPath);
$postalCode = extractNodeValue('span[contains(@class, "postal-code")]', $xPath);
$countryName = extractNodeValue('span[contains(@class, "country-name")]', $xPath);
$start = extractNodeValue('div[@id="eventDetailInfo"]/div/div/abbr[contains(@class, "dtstart")]', $xPath, 'title');
$end = extractNodeValue('div[@id="eventDetailInfo"]/div/div/abbr[contains(@class, "dtend")]', $xPath, 'title');
var_dump($place, $address, $locality, $region, $postalCode, $countryName, $start, $end);
输出:
string(5) "PLACE" string(7) "ADDRESS" string(8) "LOCALITY" string(6) "REGION" string(11) "POSTAL CODE" string(7) "COUNTRY" string(19) "2012-07-16T21:00:00" string(19) "2012-08-16T21:00:00"
答案 1 :(得分:0)
你的代码差不多完成了:
<?php
$dochtml = new DOMDocument();
$dochtml->loadHTML('<div class="vcard location p">
<div class="fn org">
<a href="link here">PLACE</a>
</div>
<div class="adr">
<div class="street-address">ADDRESS<br></div>
<div>
<span class="locality">LOCALITY</span>,
<span class="region">REGION</span>
<span class="postal-code">POSTAL CODE</span>,
<span class="country-name">COUNTRY</span>
</div>
</div>
</div>');
$xpath = new DOMXpath($dochtml);
$place = $xpath->query('//div[@class="fn org"]/a')->item(0)->nodeValue;
$address = $xpath->query('//div[@class="street-address"]')->item(0)->nodeValue;
$locality = $xpath->query('//span[@class="locality"]')->item(0)->nodeValue;
$region = $xpath->query('//span[@class="region"]')->item(0)->nodeValue;
$postalCode = $xpath->query('//span[@class="postal-code"]')->item(0)->nodeValue;
$countryName = $xpath->query('//span[@class="country-name"]')->item(0)->nodeValue;
实时代码available here。
答案 2 :(得分:-1)
如果您了解CSS选择器,请使用PHPQuery或类似的库。