删除括号内的字符串

时间:2011-05-04 13:56:50

标签: php regex preg-replace

美好的一天!

我想帮助删除方括号内的字符串并包括方括号。

字符串如下所示:

$string = "Lorem ipsum dolor<br /> [ Context are found on www.example.com ] <br />some text here. Text here. [test] Lorem ipsum dolor.";

我只想删除包含“www.example.com”的括号及其内容。我希望在字符串中保留"[test]",其他任何括号中都没有"www.example.com"

谢谢!

5 个答案:

答案 0 :(得分:3)

注意: OP已经大大改变了这个问题。此解决方案旨在以原始(更难)形式处理问题(在添加“www.example.com”约束之前)。虽然以下解决方案已经过修改以处理此附加约束,但现在可能更简单的解决方案足够(即anubhava的答案)。

这是我测试过的解决方案:

function strip_bracketed_special($text) {
    $re = '% # Remove bracketed text having "www.example.com" within markup.
          # Skip comments, CDATA, SCRIPT & STYLE elements, and HTML tags.
          (                      # $1: HTML stuff to be left alone.
            <!--.*?-->           # HTML comments (non-SGML compliant).
          | <!\[CDATA\[.*?\]\]>  # CDATA sections
          | <script.*?</script>  # SCRIPT elements.
          | <style.*?</style>    # STYLE elements.
          | <\w+                 # HTML element start tags.
            (?:                  # Group optional attributes.
              \s+                # Attributes separated by whitespace.
              [\w:.-]+           # Attribute name is required
              (?:                # Group for optional attribute value.
                \s*=\s*          # Name and value separated by "="
                (?:              # Group for value alternatives.
                  "[^"]*"        # Either double quoted string,
                | \'[^\']*\'     # or single quoted string,
                | [\w:.-]+       # or un-quoted string (limited chars).
                )                # End group of value alternatives.
              )?                 # Attribute values are optional.
            )*                   # Zero or more start tag attributes.
            \s*/?>               # End of start tag (optional self-close).
          | </\w+>               # HTML element end tags.
          )                      # End #1: HTML Stuff to be left alone.
        | # Or... Bracketed structures containing www.example.com
          \s*\[                  # (optional ws), Opening bracket.
          [^\]]*?                # Match up to required content.
          www\.example\.com      # Required bracketed content.
          [^\]]*                 # Match up to closing bracket.
          \]\s*                  # Closing bracket, (optional ws).
        %six';
    return preg_replace($re, '$1', $text);
}

请注意,正则表达式会跳过从内部删除括号内的材料:HTML注释,CDATA部分,SCRIPT和STYLE元素以及HTML标记属性值。给定以下XHTML标记(测试这些场景),上面的函数正确地删除了html元素内容中的括号内容:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <title>Test special removal. [Remove this www.example.com]</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <style type="text/css">
        .test.before {
            content: "[Do not remove www.example.com]";
        }
    </style>
    <script type="text/javascript">
        // <![CDATA[ ["Do not remove www.example.com"] ]]>
        var ob = {};
        ob["Do not remove www.example.com"] = "stuff";
        var str = "[Do not remove www.example.com]";
    </script>
</head>
<body>
<!-- <![CDATA[ ["Do not remove www.example.com"] ]]> -->
<div title="[Do not remove www.example.com]">
<h1>Test special removal. [Remove this www.example.com]</h1>
<p>Test special removal. [Remove this www.example.com]</p>
<p onclick='var str = "[Do not remove www.example.com]"; return false;'>
    Test special removal. [Do not remove this]
    Test special removal. [Remove this www.example.com]
</p>
</div>
</body>
</html>

通过上面的PHP函数运行后,这是相同的标记:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <title>Test special removal.</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <style type="text/css">
        .test.before {
            content: "[Do not remove www.example.com]";
        }
    </style>
    <script type="text/javascript">
        // <![CDATA[ ["Do not remove www.example.com"] ]]>
        var ob = {};
        ob["Do not remove www.example.com"] = "stuff";
        var str = "[Do not remove www.example.com]";
    </script>
</head>
<body>
<!-- <![CDATA[ ["Do not remove www.example.com"] ]]> -->
<div title="[Do not remove www.example.com]">
<h1>Test special removal.</h1>
<p>Test special removal.</p>
<p onclick='var str = "[Do not remove www.example.com]"; return false;'>
    Test special removal. [Do not remove this]
    Test special removal.</p>
</div>
</body>
</html>

这个解决方案应该可以很好地处理你可以抛出的任何有效(X)HTML。 (但请,没有时髦的shorttagsSGML comments!)

答案 1 :(得分:1)

$str = "Lorem ipsum dolor<br /> [ Context are found on www.example.com ] <br />some text here. Text here. [test] Lorem ipsum dolor.";
$str = preg_replace('~\[[^]]*?www\.example\.com[^]]*\]~si', "", $str);
var_dump($str);

输出

string(83) "Lorem ipsum dolor<br />  <br />some text here. Text here. [test] Lorem ipsum dolor."

PS:它可以在多行中断行。

答案 2 :(得分:0)

使用类似/\[.*?\]/的正则表达式。反斜杠是必要的,否则它会尝试匹配任何单个字符.*?

答案 3 :(得分:0)

我能想到的最简单的方法是使用正则表达式来计算[]之间的所有内容,然后将其替换为""。下面的代码将替换您在示例中使用的字符串。如果需要删除的实际字符串更复杂,则可以更改正则表达式以匹配。我建议使用regexpal.com来测试正则表达式。

$string = preg_replace("\[[A-Za-z .]*\]","",$string);

答案 4 :(得分:0)

以下代码会将<br/>更改为换行符:

$str = "Lorem ipsum dolor<br />[ Context are found on www.example.com ] <br />some text here";
$str = preg_replace( "/\[[^\]]*\]/m", "", $str);
echo $str;

输出:

  

Lorem ipsum dolor

     

这里的一些文字