Question

我在$string中有此HTML：

$string = '<p>random</p>
<a href="">Test 1</a> (target1)
<br>
<a href="">Test 2</a>  (target1)
<br>
<a href="">Test 3</a> (skip)
// etc
';

我在$array中有几句话：

$array = array(
    '(target1)',
    '(target2)'
);

如何搜索$string来查找$array中的所有术语并获取其前面的<a>标记的内容？

所以我最终得到以下结果：

$results = array(
    array(
        'text' => 'Test 1',
        'needle' => 'target1'
    ),
    array(
        'text' => 'Test 2',
        'needle' => 'target1'
    )
);

Answer 1

我会用javascript给你答案，但是php可以做同样的事情。

您可以一次搜索整个数组1个字符串，一旦找不到结果并且到达数组末尾，就结束搜索。

target1Match = s.match(/<.+?>(.+?)<\/.+?> *\(target1\)/);
// target1Match is now [<a href="">Test 1</a> (target1), Test 1]
target1Match = target1Match[1];

target2Match = s.match(/<.+?>(.+?)<\/.+?> *\(target2\)/);
// target1Match is now [<a href="">Test 2</a> (target2), Test 2]
target2Match = target2Match[1];

您使用变量“ target1和2”构建正则表达式

匹配多个目标和特定标签

s.match(/<a.+?>(.+?)<\/a> *\((target1|target2)\)/);

Answer 2

使用preg_match_all()：

// Assuming your HTML as $str, your terms as $terms
$results = [];

foreach ($terms as $t) {

    // Get content of <a> tag preceeding the term
    preg_match_all('/<a ?.*>(.*)<\/a>\s+' . preg_quote($t) . '/', $str, $matches);

    //Then insert into your result array
    foreach ($matches[1] as $m) {
        $results[] = [
            'text'   => $m,
            'needle' => $t
        ];
    }
}

输出：

// echo '<pre>' . print_r($results, true) . '</pre>';
Array
(
    [0] => Array
        (
            [text] => Test 1
            [needle] => (target1)
        )

    [1] => Array
        (
            [text] => Test 2
            [needle] => (target1)
        )

)

另请参阅：preg_quote()

Answer 3

我在JayBlanchard集中营。这是一种将DomDocument＆Xpath与动态生成的查询正确结合使用的解决方案，以定位<a>标签，标签后面紧跟着包含合格标记之一的文本。

对于样品针，这是生成的查询：

//a[following-sibling::text()[1][contains(.,'(target1)') or contains(.,'(target2)')]]

代码：（Demo）

$html = '<p>random</p>
<a href="">Test 1</a> (skip)
<br>
<a href="">Test 2</a> (target1)
<br>
<a href="">Test 3</a>  (target1)
<br>
<a href="">Test 4</a> (skip)
<br>
<a href="">Test 5</a>  (target2)
<br>
<a href="">Test 6</a> (skip)
';

$needles = [
    '(target1)',
    '(target2)'
];

$contains = array_reduce($needles, function($carry, $needle) {
    return $carry .= ($carry !== null ? ' or ' : '') . "contains(.,'$needle')";
});

$matches = [];

$dom=new DOMDocument; 
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//a[following-sibling::text()[1][$contains]]") as $node) {
    $matches[] = ["text" => $node->nodeValue, "needle" => trim($node->nextSibling->nodeValue)];
}
var_export($matches);

输出：

array (
  0 => 
  array (
    'text' => 'Test 2',
    'needle' => '(target1)',
  ),
  1 => 
  array (
    'text' => 'Test 3',
    'needle' => '(target1)',
  ),
  2 => 
  array (
    'text' => 'Test 5',
    'needle' => '(target2)',
  ),
)

获取元素内容（如果文本旁边存在）

3 个答案: