PHP正则表达式在标签和文本

时间:2016-06-23 21:28:20

标签: php regex preg-replace preg-match-all

我有一些内联内容,例如:

<p>"Geen nuwe inisiatief, bestuur verandering, of verkryging in<a href="http://business.time.com/2013/09/24/the-fatal-mistake-that-doomed-blackberry/">2007 kon gered het die BlackBerry</a>. Dit was te laat, en die kloof is te groot, "Arment geskryf.</p>

我想在标签和其他标签(强,斜体等等)前添加一个空格,只有当标签位于字母旁边时(也可以是日文符号)并且还要添加标签之后的空格只有后面的字符也是字母而不是标点符号,例如。,!,?...

你对我如何实现这个目标有所了解吗?

到目前为止我的正则表达式是:

preg_replace('/<a(.*)>(.*)<\/a>?/', ' $0', $out);

所以显然没有条件......非常感谢你的帮助。

1 个答案:

答案 0 :(得分:2)

描述

\s?<(a|strong|italic)(?=[\s>])(?:[^>=]|=(?:'[^']*'|"[^"]*"|[^'"\s]*))*\s?\/?>.*?<\/\1>(?=[\s,.;?!]|(?=.*?(\s)))

替换为: _$0$2请注意,这是一个空格,后跟$0$2

Regular expression visualization

**要更好地查看图像,只需右键单击图像并在新窗口中选择视图

此正则表达式将执行以下操作:

  • 匹配标签前的可选前导空格,如果有空格,则会自动替换,如果没有空格,则会插入
  • 只在末尾插入一个空格,如果还没有空格,并且只有下一个字符不是标点符号。

如果页面上没有更多空格,页面上的最后一个标记会出现问题。

实施例

现场演示

https://regex101.com/r/bR2gZ3/1

示例文字

<p>"Geen nuwe inisiatief, bestuur verandering, of verkryging in<a href="http://business.time.com/2013/09/24/the-fatal-mistake-that-doomed-blackberry/">2007 kon gered het die BlackBerry</a>. Dit was te laat, <a href=Droid.jpg onmouseover=' var s=" <a href=NotTheDroidsYouAreLookingFor.jpg </a> "; ' >Not the Droid you are looking for</a>en die kloof is te groot, "Arment geskryf.</p>

替换后

<p>"Geen nuwe inisiatief, bestuur verandering, of verkryging in <a href="http://business.time.com/2013/09/24/the-fatal-mistake-that-doomed-blackberry/">2007 kon gered het die BlackBerry</a>. Dit was te laat,  <a href=Droid.jpg onmouseover=' var s=" <a href=NotTheDroidsYouAreLookingFor.jpg </a> "; ' >Not the Droid you are looking for</a> en die kloof is te groot, "Arment geskryf.</p>

解释

NODE                     EXPLANATION
----------------------------------------------------------------------
  \s?                      whitespace (\n, \r, \t, \f, and " ")
                           (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  <                        '<'
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    a                        'a'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    strong                   'strong'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    italic                   'italic'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    [\s>]                    any character of: whitespace (\n, \r,
                             \t, \f, and " "), '>'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    [^>=]                    any character except: '>', '='
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    =                        '='
----------------------------------------------------------------------
    (?:                      group, but do not capture:
----------------------------------------------------------------------
      '                        '\''
----------------------------------------------------------------------
      [^']*                    any character except: ''' (0 or more
                               times (matching the most amount
                               possible))
----------------------------------------------------------------------
      '                        '\''
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      "                        '"'
----------------------------------------------------------------------
      [^"]*                    any character except: '"' (0 or more
                               times (matching the most amount
                               possible))
----------------------------------------------------------------------
      "                        '"'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      [^'"\s]*                 any character except: ''', '"',
                               whitespace (\n, \r, \t, \f, and " ")
                               (0 or more times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of grouping
----------------------------------------------------------------------
  )*                       end of grouping
----------------------------------------------------------------------
  \s?                      whitespace (\n, \r, \t, \f, and " ")
                           (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  \/?                      '/' (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  >                        '>'
----------------------------------------------------------------------
  .*?                      any character except \n (0 or more times
                           (matching the least amount possible))
----------------------------------------------------------------------
  <                        '<'
----------------------------------------------------------------------
  \/                       '/'
----------------------------------------------------------------------
  \1                       what was matched by capture \1
----------------------------------------------------------------------
  >                        '>'
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    [\s,.;?!]                  any character of: a space, ',', '.', ';', '?',
                             '!'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    (?=                      look ahead to see if there is:
----------------------------------------------------------------------
      .*?                      any character except \n (0 or more
                               times (matching the least amount
                               possible))
----------------------------------------------------------------------
      (                        group and capture to \2:
----------------------------------------------------------------------
        \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
      )                        end of \2
----------------------------------------------------------------------
    )                        end of look-ahead
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------