Question

I'm looking for a way to crawl in php the value of a <a> that does not have a class or id, but that is inside a <div> that has a class.

Here is the html code to crawler:

<div class="myclass">
    <a href="/to">value to crawl</a>
</div>

Here is the line of my php code (unsuccessfully):

preg_match_all('<div class=\"myclass\"><a>(.*)<\/a><\/div>', $myhtml, $match);

thank for your response :)

Answer 1

解析器是一个更好的解决方案：

$html = '<div class="myclass">
    <a href="/to">value to crawl</a>
</div>';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$a_s = $xpath->query('*/div[contains(@class, \'myclass\')]/a');
foreach($a_s as $a) {
    if(empty($a->getAttribute('class')) && empty($a->getAttribute('id'))) {
        echo $a->nodeValue;
    } else {
        echo 'not';
    }
}

https://3v4l.org/YmCAv

你的问题的答案是：

<a>在您的字符串中不存在
正则表达式需要PHP中的分隔符
><也不存在于您的字符串
正斜杠和双引号不需要转义，除非它们被使用，它们在正则表达式中没有特殊含义。（在下面的回答中，我使用正斜杠作为分隔符，所以我保留它逃脱）

所以要纠正你的正则表达式，那就是：

/<div class="myclass">\s*<a.*?>(.*?)<\/a>\s*<\/div>/

演示：https://regex101.com/r/0tfwDu/1/

获取<a> into </a> <div class =“”> <a>?

1 个答案: