拥有此$html
:
$html = '<p>random</p>
<a href="">Test 1</a> (target1)
<br>
<a href="">Test 2</a> (target1)
<br>
<a href="">Test 3</a> (skip)
// etc
';
我在$array
中有几句话:
$array = array(
'(target1)',
'(target2)'
);
如何使用domdocument浏览$html
来查找$array
中的所有术语并获取其前面的<a>
标记的内容?
所以我最终得到以下结果:
$results = array(
array(
'text' => 'Test 1',
'needle' => 'target1'
),
array(
'text' => 'Test 2',
'needle' => 'target1'
)
);
通过以下方法,我设法获取了<a>
中所有$html
标签的内容:
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $html);
$xpath = new DOMXPath($doc);
$elements = $xpath->query('//a');
$el_array = array();
if ($elements->length > 0) {
foreach($elements as $n) {
$node = trim(strip_tags($n->nodeValue));
if (!empty($node)) {
$el_array[] = $node;
}
}
if (!empty($el_array) && is_array($el_array)) {
print_r($el_array);
}
}
但是我还没有找到一种方法来获取目标词,以便检查我们是否有匹配项。
答案 0 :(得分:3)
您可以使用contains和following-sibling创建动态xpath查询。
xpath表达式将是:
//a/following-sibling::text()[contains(., '(target1)') or contains(., '(target2)')]
例如:
$array = array(
'(target1)',
'(target2)'
);
$contains = implode(" or ", array_map(function($x) {
return "contains(., '$x')";
}, $array));
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $html);
$xpath = new DOMXPath($doc);
$elements = $xpath->query("//a/following-sibling::text()[$contains]");
$results = [];
foreach ($elements as $element) {
$results[] = [$element->previousSibling->nodeValue, trim($element->nodeValue)];
}
print_r($results);
结果:
Array
(
[0] => Array
(
[0] => Test 1
[1] => (target1)
)
[1] => Array
(
[0] => Test 2
[1] => (target2)
)
)
答案 1 :(得分:1)
每次遇到并锚定保存他的值时,您都可以遍历解析的dom,然后检查节点值是否在数组(target1,target2
)内,如果为true,则存储在其中$result
当前节点和旧锚文本。
<?php
$html = '<p>random</p>
<a href="">Test 1</a> (target1)
<br>
<a href="">Test 2</a> (target1)
<br>
<a href="">Test 3</a> (skip)
// etc
';
$array = array(
'(target1)',
'(target2)'
);
$result = array();
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $html);
$xpath = new DOMXPath($doc);
$test = showDOMNode($doc,$array);
print_r($result);
function showDOMNode(DOMNode $domNode,$array,$oldval=false) {
global $result;
foreach ($domNode->childNodes as $node){
$nodename = $node->nodeName;
$nodevalue = $node->nodeValue;
if($nodename == "a"){
$oldval = $nodevalue;
}
if(in_array(trim ($nodevalue),$array)){
$tmp = array(
"text"=> $oldval,
"needle" =>$nodevalue
);
$result[] = $tmp;
}
if($node->hasChildNodes()) {
showDOMNode($node,$array,$oldval);
}
}
}
它输出:
Array (
[0] => Array ( [text] => Test 1 [needle] => (target1) )
[1] => Array ( [text] => Test 2 [needle] => (target1) )
)
答案 2 :(得分:0)
对不起-没有找到您需要的解决方案domdocument:/
我认为应该这样做:
$html = '
<p>random</p>
<a href="page1.php">Test 1</a> (target1)
<br>
<a href="page2.htm">Test 2</a> (target1)
<br>
<a href="page3.html">Test 3</a> (skip)
// etc
';
$array = array(
'(target1)',
'(target2)'
);
#Explode HTML into new lines, to run through each line
$lines = explode("\n", $html);
foreach ($lines as $line){
#Find pattern from $array, and if match, use preg_match_all to find the text in the a-tag
if(str_replace($array, '', $line) != $line){
preg_match_all('/<a href=\".*\">(.*?)<\/a>/s', $line, $matches);
print_r($matches[1]);
}
}
输出
Array
(
[0] => Test 1
)
Array
(
[0] => Test 2
)