PHP如何从字符串中删除断行\ n \ r \ t,但在textarea中保留完整的内容

时间:2018-04-05 21:14:12

标签: php regex html5

我制作ajax从后端获取一些HTML,在后端HTML是变量内部(使用制表符和断行线正确格式化),但是在ajax上我需要删除断行和制表符,但我需要保持textarea中的内容完整

示例,我有:

$myhtml = '
<form class="some class other another">
    <div class="title-box">
        <div class="title">Questions</div>
    </div>

    <div class="content">
        <div>
            <span>Insert title</span>
            <div>
                <input name="question" placeholder="Insert some text here" type="text" />
            </div>
        </div>
        <div class="margin-t-10">
            <label>Insert BIO</label>
            <div>
                <textarea name="bio" class="textarea-content">This is first line text
This is second line text

more lines...</textarea>
            </div>
        </div>
        <div class="description">
            <label>Insert description here</label>
            <div>
                <textarea data-something name="description" class="textarea-content other class">Line one
line two
    have some tabulation here to keep...

another line...</textarea>
            </div>
        </div>
    </div>
</form>';

我需要一些正则表达式来删除\n \t \r但保留textarea中的内容以保持原样:

$afterregex = '<form class="some class other another"><div class="title-box"><div class="title">Questions</div></div><div class="content"><div><span>Insert title</span><div><input name="question" placeholder="Insert some text here" type="text" /></div></div><div class="margin-t-10"><label>Insert BIO</label><div><textarea name="bio" class="textarea-content">This is first line text
This is second line text

more lines...</textarea></div></div><div class="description"><label>Insert description here</label><div><textarea data-something name="description" class="textarea-content other class">Line one
line two
    have some tabulation here to keep...

another line...</textarea></div></div></div></form>';

我将这样的内容返回到前端(通过ajax):

exit(json_encode(array(
    'success' => true,
    'content' => $afterregex; // without break lines and tabs
)));

1 个答案:

答案 0 :(得分:1)

在这里,我不能保证这将在每种情况下都有效。基本上它将空间保持在Open和Close标签内。因此,它会在<tag></tag>之间保留任何空格,但会在</tag><othertag>之间删除它们。

function minify( $html )
{
   return preg_replace('/>\s+<(?!\/textarea)/', '><', $html);
}


$myhtml = <<<HTML
<form class="some class other another">
    <div class="title-box">
        <div class="title">Questions</div>
    </div>

    <div class="content">
        <div>
            <span>Insert title</span>
            <div>
                <input name="question" placeholder="Insert some text here" type="text" />
            </div>
        </div>
        <div class="margin-t-10">
            <label>Insert BIO</label>
            <div>
                <textarea name="bio" class="textarea-content">This is first line text
This is second line text

more lines...</textarea>
            </div>
        </div>
        <div class="description">
            <label>Insert description here</label>
            <div>
                <textarea data-something name="description" class="textarea-content other class">

            Line one
line two
    have some tabulation here to keep...

another line...</textarea>
            </div>
        </div>
    </div>
</form>
HTML;

echo minify($myhtml);

在线试用

https://3v4l.org/2Y2E2

我在第二个文本区域添加了几个空白的起始行。这是输出:

<form class="some class other another"><div class="title-box"><div class="title">Questions</div></div><div class="content"><div><span>Insert title</span><div><input name="question" placeholder="Insert some text here" type="text" /></div></div><div class="margin-t-10"><label>Insert BIO</label><div><textarea name="bio" class="textarea-content">This is first line text
This is second line text

more lines...</textarea></div></div><div class="description"><label>Insert description here</label><div><textarea data-something name="description" class="textarea-content other class">

Line one
line two
    have some tabulation here to keep...

another line...</textarea></div></div></div></form> 

解释正则表达式/>\s+<(?!\/textarea)/

  • / open delimiter
  • >字面上匹配>
  • \s+匹配一个或多个空格
  • <字面上匹配<
  • (?!\/textarea)负向前看将与/textarea
  • 不匹配

然后我们用><替换所有匹配。

现在是英文版。匹配任何标记>的结束V形tag>。匹配一个或多个空格。匹配开场V形<。不匹配/textarea。如果其中任何一个匹配则表示不匹配。如果你将<开口的雪佛龙放在前面,那么你会得到</textarea。因此,这将与> </textarea中的<textarea class="foo" > </textarea>匹配。因此,您可以看到我们可以排除textarea标记的内容区域。

<<<内容是指示字符串的另一种方式。它被称为HEREDOC,采用<<<{tag} ... {tag};的形式。它就像使用双引号"一样,你可以放入一个PHP变量,它将被内插(用它的值替换)。对于不插入变量的'单引号,还有一个类似的。这被称为NEWDOC,并采用这种形式<<<'{tag}' ... {tag};最重要的是要记住,结束标记必须全部在一条线上,甚至不在它之前或之后的单个空间,否则它将不起作用。如果使用它的优势不明显,那是因为您没有使用引用样式'",那么您可以在HEREDOC / NEWDOC中使用它们。

//HEREDOC, you can put just $var, or I like to do {$var}
$myhtml = <<<STUFF
<form id='someId' class="some class other another" action="{$url}" >
     <input type="checkbox" name="checkbox" $checked />
</form>
STUFF; //<-- this has to be the only thing on this line, no spaces, even this comment can't be here..

//NEWDOC
$myhtml = <<<'OTHERSTUFF'
<form id='someId' class="some class other another" action="must/be/entered/manual" >
    <input type="checkbox" name="checkbox" />
</form>
OTHERSTUFF; //<-- this also has to be the only thing on this line

更新

我对Regex唯一的版本并不满意。主要原因是它不会修复其他标签中的内容,例如..

//won't any of these
<p>
    This will retain it's white space because it doesn't match \s+

    Something like this would leave all the whitespace
</p>
< span >  stuff   < / span >
//and it doesn't remove
 <!-- comments -->
//in javascript there may be issues with HTML in strings
var = "<div>in javascript</div>";

显然这不太理想。但对此的修复比看起来更复杂,或者至少它超出了我对Regex的非实质性能力。无论如何,任何有真正的正则表达式的人都会说你不能用它来解析HTML。但这并不完全正确,因为您可以使用它来创建Lexer / Tokenizer。

这正是我所做的,因为......好吧......这篇文章不够长。更不用说我自己也可以找到它。

您可以在GitHub HERE上找到它。我删除了所有评论,以便在下面粘贴的版本中尽可能地减小尺寸。但这很有趣,我真的想分享它。

class Minifier{

    const MODE_CLOSED = 'closed';

    const MODE_OPEN = 'open';

    const MODE_IGNORE = 'ignore';

    protected $ignoreTags = [
        'script',
        'style'
    ];

    protected $tokens =  [
        'T_EOF'             => '\Z',
        'T_COMMENT'         => '<(?=!--).+(?<=--)>',
        'T_OPEN_TAG'        => '<(?!\/)[^>]+(?<!\/)>',
        'T_CLOSE_TAG'       => '<(?=\/)[^>]+(?<!\/)>',
        'T_INLINE_TAG'      => '<(?!\/)[^>]+(?<=\/)>',
        'T_ENCAPSED_STRING' => '(?P<Q>\'|").*?(?<!\\\\)\k<Q>',
        'T_STRING'          => '[-\w]+',
        'T_WHITESPACE'      => '\s+',
        'T_UNKNOWN'         => '.+?'
    ];

    public function __construct($addTags = [], $removeTags = []){
        $this->unsetTag($removeTags);
        $this->setTag($addTags);
    }

    public function issetTag($ignoreTags)
    {
        return in_array($ignoreTags,$this->ignoreTags);
    }

    public function setTag($ignoreTags)
    {
        if(empty($ignoreTags)) return;
        if(!is_array($ignoreTags)) $ignoreTags = [$ignoreTags];     
        $this->ignoreTags = array_unique(array_merge($this->ignoreTags, $ignoreTags));   
    }

    public function unsetTag($ignoreTags)
    {
        if(empty($ignoreTags)) return;
        if(!is_array($ignoreTags)) $ignoreTags = [$ignoreTags];
        $this->ignoreTags = array_diff($this->ignoreTags, $ignoreTags);
    }

    public function minify($html)
    {
        $token_stream = $this->lexTokens($html);
        return $this->parseTokens($token_stream);
    }

    public function lexTokens($html)
    {
        $types = array_keys($this->tokens);
        $patterns = [];
        $token_stream = [];
        $result = false;
        foreach ($this->tokens as $k=>$v){
            $patterns[] = "(?P<$k>$v)";
        }
        $pattern = "/".implode('|', $patterns)."/is";
        if (preg_match_all($pattern, $html, $matches, PREG_OFFSET_CAPTURE)) {
            foreach ($matches[0] as $key => $value) {
                $match = [];
                foreach ($types as $type) {
                    $match = $matches[$type][$key];
                    if (is_array($match) && $match[1] != -1) {
                        break;
                    }
                }
                $tok  = [
                    'content' => $match[0],
                    'type' => $type,
                    'offset' => $match[1]
                ];
                $token_stream[] = $tok;
            }
        }
        return $token_stream;
    }

    protected function parseTokens( array &$token_stream )
    {  
        $mode = 'closed';

        $string = '';
        $result = '';

        while($current = current($token_stream)){  
            $content = $current['content'];
            $type = $current['type'];

            next($token_stream);
            switch($type){  
                case 'T_COMMENT':
                break;
                case 'T_OPEN_TAG':
                    if(strlen($string)){
                        if($mode == 'ignore'){
                            $result .= $string;
                        }else{
                            $result .= trim($string);
                        }
                        $string = '';
                    }
                    $content = $this->cleanTag($content);

                    if($this->isIgnoredTag($content)){
                        $mode = 'ignore';
                    }else{
                        $mode = 'open';
                    }
                    $result .= $content;
                break;
                case 'T_INLINE_TAG':
                case 'T_CLOSE_TAG':  
                    if(strlen($string)){
                        if($mode == 'ignore'){
                            $result .= $string;
                        }else{
                            $result .= trim($string);
                        }
                        $string = '';
                    }
                    $content = $this->cleanTag($content);
                    $result .= $content;
                    $mode = 'closed';               
                break;  
                case 'T_ENCAPSED_STRING':
                case 'T_STRING':
                case 'T_UNKNOWN':
                    switch ($mode){
                        case 'ignore':
                        case 'open':
                        case 'closed':
                            $string .= $content;
                        break;
                        default:
                            print_r($result);
                            throw new Exception("Unknown Mode:$mode for $type value $content", 1002);
                    }   
                break;           
                case 'T_WHITESPACE':
                    switch ($mode){
                        case 'closed':
                        break;
                        case 'open':
                            $string .= ' ';
                        break;
                        case 'ignore':
                            $string .= $content;
                        break;
                        default:
                            print_r($result);
                            throw new Exception("Unknown Mode:$mode for $type value $content", 1002);
                    }   
                break;
                case 'T_EOF': return $result;
                default:
                    print_r($current);
                    print_r($result);
                    throw new Exception("Unknown token $type value $content", 1001);
            }
        }
    }

    protected function cleanTag($tag)
    {
        return preg_replace([
            '/\s{2,}/',            
            '/^<\s+/',
            '/^<\/\s+/',
            '/\s+>$/',
            '/\s\/>$/'
         ],[
            ' ',
            '<',
            '</',
            '>',
            '/>',
         ], $tag);
    }

    protected function isIgnoredTag($htmlTag)
    {
        if(!preg_match('/<\/?([a-z]+)\b/i', $htmlTag, $tagName))
            throw new Exception("Cound not parse HTML tag name $htmlTag", 1000);
       return in_array($tagName[1],$this->ignoreTags);
    }
}

测试字符串,我添加了一些提到的东西。包括一些可怕的书面标签..

$html = <<<HTML
<style type="text/css" >
.body, div
{
    background-color: #CCC;
}

#someid
{
   color: #fff;
}
</style>
<p>
This is
            a
stupid p tag
            that has
    all     kinds   of  extra   space   in  it.
</p>
<   span  id="foo"  >Insert title<  /    span    ><!-- extra space in this tag, comments are removed -->
<
br
><!-- new line tag -->
<br  /  ><!-- spaced inline tag -->
<form class="some class other another"> 
    <div class="title-box">
        <div class="title">Questions</div>
    </div>

    <div class="content">
        <div>
            <span>Insert title</span>
            <div>
                <input name="question" placeholder="Insert some text here" type="text" />
            </div>
        </div>
        <div class="margin-t-10">
            <label>Insert BIO</label>
            <div>
                <textarea name="bio" class="textarea-content">This is first line text
This is second line text

more lines...</textarea>
            </div>
        </div>
        <div class="description">
            <label>Insert description here</label>
            <div>
                <textarea data-something name="description" class="textarea-content other class">

Line one
line two
    have some tabulation here to keep...

another line...</textarea>
            </div>
        </div>
    </div>
</form>
<script type="text/javascript">
(function($){
    $(document).ready(function(){
        var div = "<div>foobar</div>";
        var span = '<span>span</span>';
        $('textarea[name="bio"]').focuus();
        $(form).on('submit', function(e){
            e.preventDefault();
            return false;
        }
    });
})(jQuery);
</script>
HTML;

输出

<style type="text/css">
.body, div
{
    background-color: #CCC;
}

#someid
{
   color: #fff;
}
</style><p>This is a stupid p tag that has all kinds of extra space in it.</p><span id="foo">Insert title</span><form class="some class other another"><div class="title-box"><div class="title">Questions</div></div><div class="content"><div><span>Insert title</span><div><input name="question" placeholder="Insert some text here" type="text"/></div></div><div class="margin-t-10"><label>Insert BIO</label><div><textarea name="bio" class="textarea-content">This is first line text
This is second line text

more lines...</textarea></div></div><div class="description"><label>Insert description here</label><div><textarea data-something name="description" class="textarea-content other class">

Line one
line two
    have some tabulation here to keep...

another line...</textarea></div></div></div></form><script type="text/javascript">
(function($){
    $(document).ready(function(){
        var div = "<div>foobar</div>";
        var span = '<span>span</span>';
        $('textarea[name="bio"]').focuus();
        $(form).on('submit', function(e){
            e.preventDefault();
            return false;
        }
    });
})(jQuery);
</script>

用法

//plain text for display purposes
header('Content-type: text/plain'); 

/*
 construct accepts 2 arguments, as strings or arrays
 the first is add tag(s) to preserve white space on
 the second is remove tag(s) from the white space list
 script and style tags are preserved by default
*/
//this is what was done for the output above
echo (new Minifier('textarea'))->minify($html);

//minify all
echo (new Minifier([], ['script','style']))->minify($html);

最后但并非最不重要的是在线尝试

https://3v4l.org/AQmbS

享受,对不起,这太久了。