我尝试使用itemprop="price"
从某个链接打印所有元素的内容,但它无法正常工作,我无法弄清楚原因,这是代码:
<?php
error_reporting(0);
ini_set('display_errors', 0);
$doc = new DOMDocument();
$allscan = array(
'http://www.mobile54.co.il/30786',
'http://www.mobile54.co.il/35873',
'http://www.mobile54.co.il/34722'
);
$alllinks = array();
$html = file_get_contents($allscan[0]);
$doc->loadHTML($html);
$href = $doc->getElementsByTagName('a');
for ($j = 0; $j < count($allscan); $j++) {
$html = file_get_contents($allscan[$j]);
$doc->loadHTML($html);
$href = $doc->getElementsByTagName('a');
for ($i = 0; $i < $href->length; $i++) {
$link = $href->item($i)->getAttribute("href");
$lin = preg_replace('/\s+/', '', 'http://www.mobile54.co.il' . $link . "<br />");
if (strpos($link, 'items/') && !strpos($link, '#techDetailsAName')) {
if (!in_array($lin, $alllinks)) {
$alllinks[] = $lin;
}
}
}
}
for ($i = 0; $i < count($alllinks); $i++) {
echo $alllinks[$i];
}
for ($i = 0; $i < count($alllinks); $i++) {
$lin = "$alllinks[$i]";
$html = file_get_contents($lin);
$doc->loadHTML('<?xml encoding="UTF-8"?>' . $html);
$span = $doc->getElementsByTagName('span');
for ($j = 0; $j < $span->length; $j++) {
$attr = $span->item($j)->getAttribute('itemprop');
if ($attr == "price") {
echo $span->item($j)->textContent . "<br />";
}
}
}
?>
当我粘贴&#34; someurl&#34; $lin
它的工作原理但另一方面却没有。我试过$html = file_get_contents($alllinks[$i]);
,但它没有用,我不知道为什么。
答案 0 :(得分:0)
我认为您的问题可能是由于某种原因您在网址末尾添加了<br />
。但是,使用XPath有很多改进代码的机会。 (另请注意,您只需将URL直接传递给DomDocument对象。)
首先,我们使用matching attribute values提取所有<a>
元素。我们获取网址,然后搜索具有完全匹配的itemprop
属性的元素,并获取它们的text content。
<?php
$url = "http://www.mobile54.co.il/30786";
$prices = [];
$hrefs = [];
$combined = [];
$dom = new DomDocument;
libxml_use_internal_errors(true);
$dom->loadHtmlFile($url);
$xpath = new DomXPath($dom);
// get <a> elements with href containing items/ but not #techDetailsAName
$nodes = $xpath->query("//a[contains(@href, 'items/') and not(contains(@href, '#techDetailsAName'))]/@href");
foreach ($nodes as $node) {
$hrefs[] = trim($node->value);
}
// now you have a list of URLs
foreach ($hrefs as $k=>&$href) {
$href = "http://www.mobile54.co.il$href";
$dom->loadHtmlFile($href);
$xpath = new DomXPath($dom);
// get any element with itemprop of price
$nodes = $xpath->query("//*[@itemprop='price']");
$prices[$k] = $nodes->item(0)->textContent;
}
// now you have $urls and $prices, combine them:
foreach ($hrefs as $k=>$v) {
$combined[$k] = [$hrefs[$k], $prices[$k]];
}
print_r($combined);