从文本中删除完整的URL

时间:2011-05-08 17:11:28

标签: php pattern-matching

应该是一个简单的模式匹配和替换,但我希望能够从文本中删除完整的URL。

所以:

'你一定要喜欢它! http://www.youtube.com/watch?v=0i_bkLbf3EI检查一下!!!'

变为:

'你一定要喜欢它!看看!!!'

有什么想法吗?

4 个答案:

答案 0 :(得分:6)

$string = preg_replace('/\b(https?):\/\/[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$]/i', '', $string);

答案 1 :(得分:6)

第1步:查找与网址匹配的正则表达式

http://mathiasbynens.be/demo/url-regex

似乎最后一个(@diegoperini)是最好的,但它的重量为502个字符。

第2步:将该正则表达式的所有匹配替换为空字符串

$output = preg_replace($regex, '', $input);

答案 2 :(得分:2)

参见Daring Fireball的An Improved Liberal, Accurate Regex Pattern for Matching URLs

摘录:

(?xi)
\b
(                           # Capture 1: entire matched URL
  (?:
    [a-z][\w-]+:                # URL protocol and colon
    (?:
      /{1,3}                        # 1-3 slashes
      |                             #   or
      [a-z0-9%]                     # Single letter or digit or '%'
                                    # (Trying not to match e.g. "URI::Escape")
    )
    |                           #   or
    www\d{0,3}[.]               # "www.", "www1.", "www2." … "www999."
    |                           #   or
    [a-z0-9.\-]+[.][a-z]{2,4}/  # looks like domain name followed by a slash
  )
  (?:                           # One or more:
    [^\s()<>]+                      # Run of non-space, non-()<>
    |                               #   or
    \(([^\s()<>]+|(\([^\s()<>]+\)))*\)  # balanced parens, up to 2 levels
  )+
  (?:                           # End with:
    \(([^\s()<>]+|(\([^\s()<>]+\)))*\)  # balanced parens, up to 2 levels
    |                                   #   or
    [^\s`!()\[\]{};:'".,<>?«»“”‘’]        # not a space or one of these punct chars
  )
)

答案 3 :(得分:1)

试试这个:

/http:\/\/[a-zA-Z0-9\.\/\?\=\_]+/