从字符串中的链接中删除部件

时间:2017-04-11 01:48:36

标签: php regex preg-replace

如何更改字符串中的所有链接:

...<p><a href="https://www.somesite.com/url?q=http://www.someothersite.se/&amp;q1=xxx&q2=xxx">Some text</a>...

分为:

...<p><a href="http://www.someothersite.se/">Some text</a>...

&#34; ...&#34;意味着还有很多其他代码。此外,字符串中还有多个这样的链接。所有链接都是这样的。

2 个答案:

答案 0 :(得分:1)

工作解决方案:

$regex = <<<EOF
%(<[aA]\s[^>]*href=['"])([^"']+url\?q=([A-z]+:\/{2}[^"'&]+)[^"']*)(["'][^>]*>)%im
EOF;

$replacement = '$1$3$4';

$html = <<<EOF
...<p><a href="https://www.somesite.com/url?q=http://www.secondsite.se/&amp;q1=xxx&q2=xxx">Some text</a>...
...<p><a class="lnk" href="https://www.somesite.com/url?q=http://www.thirdsite.se" id="lnk">Some text</a>...
...<p><a class="lnk2" href="https://www.somesite.com/">Some text</a>...
EOF;

$new_html = preg_replace($regex, $replacement, $html);

正则表达式解释说:

(                     - Group 1 (tag A from beginning to href parameter)
  <[aA]\s             - Match <a or <A followed by white character
  [^>]*               - Match anything after it except > because we want to match all parameters (like class, id etc.)
  href=['"]           - match href parameter with equal sign and ' or " after it
)                     - End group 1
(                     - Group 2 (content of href parameter)
    [^"']+            - everything that is not ' or "
    url\?q=           - url?q=
    (                 - Group 3 (URL we are really interested in)
        [A-z]+:\/{2}  - match protocol of the url http:// https:// ftp:// etc.
        [^"'&]+       - match anything except ' " or &. those characters represents end of the url we are interested in.
    )                 - End group 3
    [^"']*            - Anything except " or ' - this represents end of href parameter
)                     - End group 2
(                     - Group 4 - rest of the tag
    ["']              - " or ' closing href parameter
    [^>]*             - anything except > so we match rest of the tag
    >                 - finally we match closing character >
)                     - End group 4

然后我们只用第1,3和4组替换整个事物。

答案 1 :(得分:0)

我想你可以使用类似的东西:

$new_url = preg_replace('%<a href=".*?\?q=(.*?)&.*?">(.*?)</a>%im', '<a href="$1">$2</a>', $old_url);