Question

有没有办法使用preg_replace()在utm=some&medium=stuff

中找到的所有找到的网址末尾添加字符串“$html_text?”

$html_text = 'Lorem ipsum <a href="http://www.me.com">dolor sit</a> amet, 
              <a href="http://www.me.com/page.php?id=10">consectetur</a> elit.';

所以结果应该是

href="http://www.me.com" ›››››
href="http://www.me.com?utm=some&medium=stuff"

href="http://www.me.com/page.php?id=1" ›››››
href="http://www.me.com/page.php?id=1&utm=some&medium=stuff"

因此，如果网址包含问号（第二个网址），则应在“&”前面添加＆符号“?”而不是问号“utm=some...”

最终它只会改变domain me.com.

的网址

Answer 1

这有点棘手，但如果您的网址全部用引号（单引号或双引号）括起来，则以下代码应该有效。它还将处理片段标识符（如#section-2）。

$url_modifier = 'utm=some&medium=stuff';
$url_modifier_domain = preg_quote('www.me.com');

$html_text = preg_replace_callback(
              '#((?:https?:)?//'.$url_modifier_domain.'(/[^\'"\#]*)?)(?=[\'"\#])#i',
              function($matches){
                global $url_modifier;
                if (!isset($matches[2])) return $matches[1]."/?$url_modifier";
                $q = strpos($matches[2],'?');
                if ($q===false) return $matches[1]."?$url_modifier";
                if ($q==strlen($matches[2])-1) return $matches[1].$url_modifier;
                return $matches[1]."&$url_modifier";
              },
              $html_text);

输入：

<a href="http://www.me.com">Lorem</a>
<a href="http://www.me.com/">ipsum</a>
<a href="http://www.me.com/#section-2">dolor</a>
<a href="http://www.me.com/path-to-somewhere/file.php">sit</a>
<a href="http://www.me.com/?">amet</a>,
<a href="http://www.me.com/?foo=bar">consectetur</a>
<a href="http://www.me.com/?foo=bar#section-3">elit</a>.

输出：

<a href="http://www.me.com/?utm=some&medium=stuff">Lorem</a>
<a href="http://www.me.com/?utm=some&medium=stuff">ipsum</a>
<a href="http://www.me.com/?utm=some&medium=stuff#section-2">dolor</a>
<a href="http://www.me.com/path-to-somewhere/file.php?utm=some&medium=stuff">sit</a>
<a href="http://www.me.com/?utm=some&medium=stuff">amet</a>,
<a href="http://www.me.com/?foo=bar&utm=some&medium=stuff">consectetur</a>
<a href="http://www.me.com/?foo=bar&utm=some&medium=stuff#section-3">elit</a>.

Answer 2

您可以使用preg_replace，2种模式和两种替换来实现这一目标：

<?php
$add = "utm=some&medium=stuff";
$patterns = array(
                '/(https?:\/\/(?:www)?me\.com(?=.*?\?)[^"]*)/',  # positive lookahead to check if there is a ? mark in url
                '/(https?:\/\/(?:www)?me\.com(?!.*?\?)[^"]*)/' # negative lookahead to check if ? mark is not in
        );
$replacements = array(
                    "$1&".$add, # replacement if first pattern take place
                    '$1?'.$add  # replacement if second pattern take place
            );
$str = 'Lorem ipsum <a href="http://www.me.com">dolor sit</a> amet, <a href="http://www.me.com/page.php?id=10">consectetur</a> elit.';
$str = preg_replace($patterns, $replacements, $str);
echo $str;

/* Output:
Lorem ipsum <a href="http://www.me.com&utm=some&medium=stuff">dolor sit</a> amet, <a href="http://www.me.com/page.php?id=10&utm=some&medium=stuff">consectetur</a> elit.
*/
?>

我喜欢使用DOM解决方案的其他答案，然后我测试了每个片段为以下输入所用的时间：

<a href="http://www.me.com">Lorem</a>
<a href="http://www.me.com/">ipsum</a>
<a href="http://www.me.com/#section-2">dolor</a>
<a href="http://www.me.com/path-to-somewhere/file.php">sit</a>
<a href="http://www.me.com/?">amet</a>,
<a href="http://www.me.com/?foo=bar">consectetur</a>
<a href="http://www.me.com/?foo=bar#section-3">elit</a>.

使用microtime：

$ts = microtime(true);
// codes
printf("%.10f\n", microtime(true) - $ts);

你可以在下面看到它们（ms）：

@squeamish ossifrage:  0.0001089573
@Cobra_Fast:           0.0003509521
@Emissary:             0.0094890594
@Me:                   0.0000669956

这对我很有意思，RegEx做得很好。

Answer 3

使用DOMDocument：

这是一项微不足道的任务

$html_text = 'Lorem ipsum <a href="http://www.me.com">dolor sit</a> amet, <a href="http://www.me.com/page.php?id=10">consectetur</a> elit.';

$html = new DOMDocument();
$html->loadHtml($html_text);

foreach ($html->getElementsByTagName('a') as $element)
{
    $href = $element->getAttribute('href');
    if (!empty($href)) // only edit the attribute if it's set
    {
        // check if we need to append with ? or &
        if (strpos($href, '?') === false)
            $href .= '?';
        else
            $href .= '&';

        // append querystring
        $href .= 'utm=some&medium=stuff';

        // set attribute
        $element->setAttribute('href', $href);
    }
}

// output altered code
echo $html->C14N();

小提琴：http://phpfiddle.org/lite/code/wvq-ujk

Answer 4

如果您想从脚本中抽象出所有令人讨厌的解析，您可以始终使用DOM解析器，其中many available。对于这个例子，我选择了Simple HTML-DOM，因为它是我真正熟悉的唯一一个（它无疑是最有效的库，但你没有做任何密集的工作）。

include 'simple_html_dom.php';
$html = str_get_html($htmlString);

foreach($html->find('a') as $a){
    $url = strtolower($a->href);
    if( strpos($url, 'http://me.com')     === 0 ||
        strpos($url, 'http://www.me.com') === 0 ||
        strpos($url, 'http://') !== 0 // local url
    ){
        $url = explode('?', $url, 2);
        if(count($url)<2) $qry = array();
        else parse_str($url[1], $qry);
        $qry = array_merge($qry, array(
            'utm'    => 'some',
            'medium' => 'stuff'
        ));
        $parts = array();
        foreach($qry as $key => $val)
            $parts[] = "{$key}={$val}";
        $a->href = sprintf("%s?%s", $url[0], implode('&', $parts));
    }
}

echo $html;

在这个示例中，我假设 me.com 是您的网站，本地路径也应符合条件。我还假设查询字符串可能是简单的 key：value 对。在它的当前形式中，如果URL已经具有您的一个查询参数，那么它将被覆盖。如果您想保留现有值，则需要交换array_merge函数中参数的顺序。

输入

<a href="http://me.com/">test</a> 
<a href="http://WWW.me.com/">test</a> 
<a href="local.me.com.php">test</a> 
<a href="http://notme.com">test</a> 
http://me.com/not-a-link
<a href="http://me.com/?id=10&utm=bla">test</a>

输出：

<a href="http://me.com/?utm=some&medium=stuff">test</a> 
<a href="http://www.me.com/?utm=some&medium=stuff">test</a> 
<a href="local.me.com.php?utm=some&medium=stuff">test</a> 
<a href="http://notme.com">test</a> 
http://me.com/not-a-link 
<a href="http://me.com/?id=10&utm=some&medium=stuff">test</a>

Answer 5

如果DOMDocument和utf8有问题，请尝试以下操作：

$html_text = '<p>This is a text with speical chars ÄÖÜ <a 
href="http://example.com/This-is-my-Page" 
target="_self">here</a>.</p>';
$html_text .= '<p>continue</p>';

$html = new DOMDocument('1.0', 'utf-8');

// Set charset-header for DOMDocument
$html_prepared = '<html>'
  . '<head>'
  . '<meta http-equiv="content-type" content="text/html; charset=UTF-8">'
  . '</head>'
  . '<body>'
  . '<div>' . $html_text . '</div>'
  . '</body>';


$html->loadHtml($html_prepared);


foreach ($html->getElementsByTagName('a') as $element)
{
    $href = $element->getAttribute('href');
    if (!empty($href)) // only edit the attribute if it's set
    {
        // check if we need to append with ? or &
        if (strpos($href, '?') === false)
            $href .= '?';
        else
            $href .= '&';

        // append querystring
        $href .= 'utm=some&medium=stuff';

        // set attribute
        $element->setAttribute('href', $href);
    }
}


// 1) Remove doctype-declaration
$html->removeChild($html->firstChild);
// 2) Remove head
$html->firstChild->removeChild($html->firstChild->firstChild);
// 3) Only keep body's first Child
$html->replaceChild($html->firstChild->firstChild->firstChild, $html->firstChild);

print $html->saveHTML();

PHP改变了HTML中的url

5 个答案:

输入

输出：