Question

我想只获取没有任何参数的网址的“清洁”版本。 IOW ...如果网址内有问号，请将其删除，然后删除所有内容。

这是我现在的行：

preg_match_all('/<a(.*?)href=("|\'|)(.*?)("|\'| )(.*?)>/s',$content,$ahref);

而且在这里要更清楚......我期待这个网址（例如）：

/go/page/mobile_download_apps.html?&who=r,6GDewh28SCW3/fUSqmWqR_E9ljkcH1DheIMqgbiHjlX3OBDbskcuCZ22iDvk0zeZR7BEthcEaXGFWaQ4Burmd4eKuhMpqojjDE6BrCiUtLClkT32CejpMIdnqVOUmWBD

将是：

/go/page/mobile_download_apps.html

Answer 1

使用DOMDocument，strpos，substr：

$dom = new DOMDocument;
$dom->loadHTML($content);

$linkNodeList = $dom->getElementsByTagName('a');

foreach($linkNodeList as $linkNode) {
    $href = $linkNode->getAttribute('href');

    if ( false !== ($offset = strpos($href, '?')) )
        $linkNode->setAttribute('href', substr($href, 0, $offset));
}

$newContent = $dom->saveHTML();

或爆炸：

$linkNode->setAttribute('href', explode('?', $href)[0]);

Answer 2

你的意思是这种行为：

<a\s+href\s*=\s*"\K[^"?]+


$result = preg_replace('/<a\s+href\s*=\s*"\K[^"?]+/im', '', $text);

Answer 3

如评论中所述，您不应该使用正则表达式获取标记，您应该使用解析器。不过，你走了：

<a[^>]+href=("|')([^"'?]*)[^"']*\1[^>]*>

演示：https://regex101.com/r/tV5pP8/3

Answer 4

Opps ......我身边缺乏专注力:)。

我自己解决了......（这很容易）

以下是最后一行：

preg_match_all('/<a(.*?)href=("|\'|)(.*?)(\?|"|\'| )(.*?)>/s',$content,$ahref);

PHP preg_match_all动态删除url参数

4 个答案: