匹配字符集和可选实体

时间:2011-03-03 13:35:19

标签: php regex

所以我想使用这段代码在字符串的每5个字符中插入一个分词符。

([^\s-]{5})([^\s-]{5})

不幸的是,它也会破坏实体字符(&#xxx;)。 有人能为我提供一个不会破坏实体代码的例子吗? 我要破解的字符串来自xml,因此实际实体会进一步转义(&#xxx;)。

修改代码示例

preg_replace('/([^\s-]{5})([^\s-]{5})/', '$1­$2', $subject)

Given the word "Fårevejle"
Expect "Få­revejle" as result
But it outputs "F­5;revejle" instead

1 个答案:

答案 0 :(得分:4)

假设您要将每个单词拆分为五个字符,除非它们已经用连字符分隔,将实体视为单个字符,请尝试:

$result = preg_replace(
    '/            # Start the match 
    (?:           # at one of the following positions:
     (?<=         # Either right after...
      [\s-]       # a space or dash
     )            # end of lookbehind
     |            # or...
     \G           # wherever the last match ended.
    )             # End of start condition.
    (             # Now match and capture the following:
     (?>          # Match the following in an atomic group:
      &amp;\#\w+; # an entity
      |           # or
      [^\s-]      # a non-space, non-dash character
     ){5}         # exactly 5 times.
    )             # End of capture
    (?=[^\s-])    # Assert that we\'re not at the end of a "word"/x', 
    '\1&shy;', $subject);

此更改

supercalifragilisticexpidon'tremember! 
alrea-dy se-parated 
count entity as one character&amp;#345;blahblah
F&amp;#xe5;revejle

super&shy;calif&shy;ragil&shy;istic&shy;expid&shy;on'tr&shy;ememb&shy;er! 
alrea-dy se-parat&shy;ed 
count entit&shy;y as one chara&shy;cter&amp;#345;&shy;blahb&shy;lah
F&amp;#xe5;rev&shy;ejle