正则表达式匹配不被另一个特定字符串包围的特定字符串

时间:2012-04-09 21:18:49

标签: php regex preg-match preg-match-all regular-language

我需要一个正则表达式来匹配一个没有被另一个不同的特定字符串包围的字符串。例如,在以下情况下,它会将内容分为两组:1)第二个{Switch}之前的内容和2)第二个{Switch}之后的内容。它与第一个{Switch}不匹配,因为它被{my_string}括起来。该字符串将始终如下所示(即{my_string}此处的任何内容{/ my_string})

Some more  
  {my_string}
  Random content
  {Switch} //This {Switch} may or may not be here, but should be ignored if it is present
  More random content
  {/my_string}
Content here too
{Switch}
More content

到目前为止,我已经得到了下面的内容,我知道它并不是非常接近:

(.*?)\{Switch\}(.*?)

我只是不确定如何使用特定字符串与不同字符的[^](非运算符)。

5 个答案:

答案 0 :(得分:2)

看起来你似乎正在尝试使用正则表达式来解析语法 - 正则表达式真的很糟糕。你可能最好写一个解析器来将你的字符串分解为构建它的标记,然后处理那个树。

也许类似http://drupal.org/project/grammar_parser的内容可能会有所帮助。

答案 1 :(得分:1)

$regex = (?:(?!\{my_string\})(.*?))(\{Switch\})(?:(.*?)(?!\{my_string\}));
/* if "my_string" and "Switch" aren't wrapped by "{" and "}" just remove "\{" and "\}" */
$yourNewString = preg_replace($regex,"$1",$yourOriginalString);

这可能有用。无法测试它知道,但我会稍后更新! 我不知道如果这是你正在寻找的,但是为了否定多个字符,正则表达式语法是:

(?!yourString) 

它被称为“负前瞻断言”。

/编辑:

这应该有效并返回true:

$stringMatchesYourRulesBoolean = preg_match('~(.*?)('.$my_string.')(.*?)(?<!'.$my_string.') ?('.$switch.') ?(?!'.$my_string.')(.*?)('.$my_string.')(.*?)~',$yourString);

答案 2 :(得分:1)

你可以尝试积极的前瞻和后瞻性断言(http://www.regular-expressions.info/lookaround.html)

它可能看起来像这样:

$content = 'string of text before some random content switch text some more random content string of text after';
$before  = preg_quote('String of text before');
$switch  = preg_quote('switch text');
$after   = preg_quote('string of text after');
if( preg_match('/(?<=' $before .')(.*)(?:' $switch .')?(.*)(?=' $after .')/', $content, $matches) ) {
    // $matches[1] == ' some random content '
    // $matches[2] == ' some more random content '
}

答案 3 :(得分:1)

尝试这个简单的功能:

function find_content()

function find_content($doc) {
  $temp = $doc;
  preg_match_all('~{my_string}.*?{/my_string}~is', $temp, $x);
  $i = 0;
  while (isset($x[0][$i])) {
    $temp = str_replace($x[0][$i], "{REPL:$i}", $temp);
    $i++;
    }
  $res = explode('{Switch}', $temp);
  foreach ($res as &$part) 
    foreach($x[0] as $id=>$content)
      $part = str_replace("{REPL:$id}", $content, $part);
  return $res;
  }

以这种方式使用

$content_parts = find_content($doc); // $doc is your input document
print_r($content_parts);

输出(您的示例)

Array
(
    [0] => Some more
{my_string}
Random content
{Switch} //This {Switch} may or may not be here, but should be ignored if it is present
More random content
{/my_string}
Content here too

    [1] => 
More content
)

答案 4 :(得分:0)

看看PHP PEG。它是一个用PHP编写的小解析器。你可以编写自己的语法并解析它。在你的情况下,这将非常简单。

语法语法和解析方法都在README.md

中解释

自述文件摘录:

  token*  - Token is optionally repeated
  token+ - Token is repeated at least one
  token? - Token is optionally present

代币可能是:

 - bare-words, which are recursive matchers - references to token rules defined elsewhere in the grammar,
 - literals, surrounded by `"` or `'` quote pairs. No escaping support is provided in literals.
 - regexs, surrounded by `/` pairs.
 - expressions - single words (match \w+)

示例语法:(文件EqualRepeat.peg.inc)

class EqualRepeat extends Packrat {
/* Any number of a followed by the same number of b and the same number of c characters
 * aabbcc - good
 * aaabbbccc - good
 * aabbc - bad
 * aabbacc - bad
 */

/*Parser:Grammar1
A: "a" A? "b"
B: "b" B? "c"
T: !"b"
X: &(A !"b") "a"+ B !("a" | "b" | "c")
*/
}