我正在尝试解析一个html文件。
我们的想法是使用title
和desc
类来获取范围,并在每个具有属性class ='thebest'的div中获取它们的信息。
这是我的代码:
<?php
$example=<<<KFIR
<html>
<head>
<title>test</title>
</head>
<body>
<div class="a">moshe1
<div class="aa">haim</div>
</div>
<div class="a">moshe2</div>
<div class="b">moshe3</div>
<div class="thebest">
<span class="title">title1</span>
<span class="desc">desc1</span>
</div>
<div class="thebest">
span class="title">title2</span>
<span class="desc">desc2</span>
</div>
</body>
</html>
KFIR;
$doc = new DOMDocument();
@$doc->loadHTML($example);
$xpath = new DOMXPath($doc);
$expression="//div[@class='thebest']";
$arts = $xpath->query($expression);
foreach ($arts as $art) {
$arts2=$xpath->query("//span[@class='title']",$art);
echo $arts2->item(0)->nodeValue;
$arts2=$xpath->query("//span[@class='desc']",$art);
echo $arts2->item(0)->nodeValue;
}
echo "done";
预期结果是:
title1desc1title2desc2done
我收到的结果是:
title1desc1title1desc1done
答案 0 :(得分:10)
使查询相对...以点开头(例如".//…"
)。
foreach ($arts as $art) {
// Note: single slash (direct child)
$titles = $xpath->query("./span[@class='title']", $art);
if ($titles->length > 0) {
$title = $titles->item(0)->nodeValue;
echo $title;
}
$descs = $xpath->query("./span[@class='desc']", $art);
if ($descs->length > 0) {
$desc = $descs->item(0)->nodeValue;
echo $desc;
}
}
答案 1 :(得分:1)
尝试textContent
foreach ($arts as $art) {
echo $art->textContent;
}
textContent
返回此节点及其后代的文本内容。
作为替代方案,将XPath更改为
$expression="//div[@class='thebest']/span[@class='title' or @class='desc']";
$arts = $xpath->query($expression);
foreach ($arts as $art) {
echo $art->nodeValue;
}
这将获取div的span子项,其中有一个类具有标题类或desc。