在标签的两侧修剪空格?

时间:2019-04-08 21:23:30

标签: php

有没有一种方法可以在字符串中剪裁标签两侧的空白,从而实现:

<p> Lorem ipsum dolor sit amet.</p> <p>Lorem ipsum dolor sit amet.</p>

被转换成这样:

<p>Lorem ipsum dolor sit amet.</p><p>Lorem ipsum dolor sit amet.</p>

2 个答案:

答案 0 :(得分:1)

如果您只是在寻找简单的修剪,可以使用str_replace更改所有适当的空格。

str_replace('<p> ', '<p>'); # repeat for whatever you need.

另一方面,您可以使用preg_replace接受正则表达式,这样就可以使它像单遍一样复杂。有关更多信息,请参见https://www.php.net/manual/en/function.preg-replace.php

答案 1 :(得分:1)

我给这个人写了另一个问题(可能是一年前< on Apr 6, 2018):PHP How to remove break line \n \r \t from string but keep intact content inside textarea

它在我的gitHub上

https://github.com/ArtisticPhoenix/MISC/blob/master/Lexers/HtmlMinifier.php

完整代码:

<?php
class Minifier{

    /**
     * not within an HTML tag
     * @var string
     */
    const MODE_CLOSED = 'closed';

    /**
     * withing an html tag
     * @var string
     */
    const MODE_OPEN = 'open';

    /**
     * ignore whitespace tell the next close tag
     * @var string
     */
    const MODE_IGNORE = 'ignore';

    /**
     * HTML tags to ingnore whitespace on
     * @var array
     */
    protected $ignoreTags = [
        'script',
        'style'
    ];

    /**
     * Regex tokens
     * @var array
     */
    protected $tokens =  [
        'T_EOF'             => '\Z',                            //matches end of string
        'T_COMMENT'         => '<(?=!--).+(?<=--)>',            //matches only <!-- comment -->
        'T_OPEN_TAG'        => '<(?!\/)[^>]+(?<!\/)>',          //matches only <tag ... >
        'T_CLOSE_TAG'       => '<(?=\/)[^>]+(?<!\/)>',          //matches only </tag ..>
        'T_INLINE_TAG'      => '<(?!\/)[^>]+(?<=\/)>',          //matches only <tag ... />
        'T_ENCAPSED_STRING' => '(?P<Q>\'|").*?(?<!\\\\)\k<Q>',   //matches  "foo\"bar" or 'foo\'bar'
        'T_STRING'          => '[-\w]+',                        //matches -0-9a-z
        'T_WHITESPACE'      => '\s+',                           //matches \s\t\r\n
        'T_UNKNOWN'         => '.+?'                            //matches everything else 
    ];

    /**
     * 
     * @param mixed $ignoreTags
     * @param mixed $removeTags
     */
    public function __construct($addTags = [], $removeTags = []){
        $this->unsetTag($removeTags);
        $this->setTag($addTags);
    }

    /**
     * 
     * @param mixed $ignoreTags
     * @return bool
     */
    public function issetTag($ignoreTags)
    {
        return in_array($ignoreTags,$this->ignoreTags);
    }

    /**
     * Set one or more tags
     * 
     * @param mixed $ignoreTags 
     */
    public function setTag($ignoreTags)
    {
        if(empty($ignoreTags)) return;
        if(!is_array($ignoreTags)) $ignoreTags = [$ignoreTags];     
        $this->ignoreTags = array_unique(array_merge($this->ignoreTags, $ignoreTags));   
    }

    /**
     * Set one or more tags
     *
     * @param mixed $ignoreTags
     */
    public function unsetTag($ignoreTags)
    {
        if(empty($ignoreTags)) return;
        if(!is_array($ignoreTags)) $ignoreTags = [$ignoreTags];
        $this->ignoreTags = array_diff($this->ignoreTags, $ignoreTags);
    }
    /**
     * 
     * @param string $html
     * @return string
     */
    public function minify($html)
    {
        $token_stream = $this->lexTokens($html);
        return $this->parseTokens($token_stream);
    }

    /**
     * 
     * @param string $html
     * @return boolean
     */
    public function lexTokens($html)
    {
        $types = array_keys($this->tokens);
        $patterns = [];
        $token_stream = [];
        $result = false;
        foreach ($this->tokens as $k=>$v){
            $patterns[] = "(?P<$k>$v)";
        }
        $pattern = "/".implode('|', $patterns)."/is";
        if (preg_match_all($pattern, $html, $matches, PREG_OFFSET_CAPTURE)) {
            //print_r($matches);
            foreach ($matches[0] as $key => $value) {
                $match = [];
                foreach ($types as $type) {
                    $match = $matches[$type][$key];
                    if (is_array($match) && $match[1] != -1) {
                        break;
                    }
                }
                $tok  = [
                    'content' => $match[0],
                    'type' => $type,
                    'offset' => $match[1]
                ];
                $token_stream[] = $tok;
            }
        }
        return $token_stream;
    }

    /**
     * 
     * @param array $token_stream - pass by refrence
     * @throws Exception
     * @return string
     */
    protected function parseTokens( array &$token_stream )
    {  
        $mode = 'closed';

        $string = '';
        $result = '';

        while($current = current($token_stream)){  
            $content = $current['content'];
            $type = $current['type'];

            next($token_stream);
            switch($type){  
                case 'T_COMMENT':
                    //remove comments
                break;
                case 'T_OPEN_TAG':
                    if(strlen($string)){
                        //add trimmed string to result, reset string.
                        if($mode == 'ignore'){
                            $result .= $string;
                        }else{
                            $result .= trim($string);
                        }
                        $string = '';
                    }
                    //clean
                    $content = $this->cleanTag($content);

                    if($this->isIgnoredTag($content)){
                        //indicate ignore whitespace
                        $mode = 'ignore';
                    }else{
                        //indicate a tag is open
                        $mode = 'open';
                    }
                    $result .= $content;
                break;
                case 'T_INLINE_TAG':
                case 'T_CLOSE_TAG':  
                    if(strlen($string)){
                        //add trimmed string to result, reset string.
                        if($mode == 'ignore'){
                            $result .= $string;
                        }else{
                            $result .= trim($string);
                        }
                        $string = '';
                    }
                    //clean
                    $content = $this->cleanTag($content);
                    //add content to result
                    $result .= $content;
                    //indicate a tag is closed
                    $mode = 'closed';               
                break;  
                case 'T_ENCAPSED_STRING':
                case 'T_STRING':
                case 'T_UNKNOWN':
                    switch ($mode){
                        case 'ignore':
                        case 'open':
                        case 'closed':
                            //add content to string (not result)
                            $string .= $content;
                        break;
                        default:
                            print_r($result);
                            throw new Exception("Unknown Mode:$mode for $type value $content", 1002);
                    }   
                break;           
                case 'T_WHITESPACE':
                    switch ($mode){
                        case 'closed':
                            //remove whitespace between tags.
                        break;
                        case 'open':
                            //add only on space ot string no matter how many we find
                            $string .= ' ';
                        break;
                        case 'ignore':
                            $string .= $content;
                        break;
                        default:
                            print_r($result);
                            throw new Exception("Unknown Mode:$mode for $type value $content", 1002);
                    }   
                break;
                case 'T_EOF': return $result;
                default:
                    print_r($current);
                    print_r($result);
                    throw new Exception("Unknown token $type value $content", 1001);
            }
        }
    }

    /**
     * 
     * @param string  $tag
     * @return string
     */
    protected function cleanTag($tag)
    {
        return preg_replace([
            '/\s{2,}/',            
            '/^<\s+/',
            '/^<\/\s+/',
            '/\s+>$/',
            '/\s\/>$/'
         ],[
            ' ',
            '<',
            '</',
            '>',
            '/>',
         ], $tag);
    }

    /**
     * 
     * should be cleand with cleanTag first.
     * 
     * @param string $htmlTag
     * @param array $ignoreTags
     * @throws Exception
     * @return boolean
     */
    protected function isIgnoredTag($htmlTag)
    {
        if(!preg_match('/<\/?([a-z]+)\b/i', $htmlTag, $tagName))
            throw new Exception("Cound not parse HTML tag name $htmlTag", 1000);
       return in_array($tagName[1],$this->ignoreTags);
    }
}

包含的示例

$html = <<<HTML
<style type="text/css" >
.body, div
{
    background-color: #CCC;
}
#someid
{
   color: #fff;
}
</style>
<p> Lorem ipsum dolor sit amet.</p> <p>Lorem ipsum dolor sit amet.</p>
<p>
This is
            a
stupid p tag
            that has
    all     kinds   of  extra   space   in  it.
</p>
<   span  id="foo"  >Insert title<  /    span    ><!-- extra space in this tag, comments are removed -->
<
br
><!-- new line tag -->
<br  /  ><!-- spaced inline tag -->
<form class="some class other another"> 
    <div class="title-box">
        <div class="title">Questions</div>
    </div>
    <div class="content">
        <div>
            <span>Insert title</span>
            <div>
                <input name="question" placeholder="Insert some text here" type="text" />
            </div>
        </div>
        <div class="margin-t-10">
            <label>Insert BIO</label>
            <div>
                <textarea name="bio" class="textarea-content">This is first line text
This is second line text
more lines...</textarea>
            </div>
        </div>
        <div class="description">
            <label>Insert description here</label>
            <div>
                <textarea data-something name="description" class="textarea-content other class">
Line one
line two
    have some tabulation here to keep...
another line...</textarea>
            </div>
        </div>
    </div>
</form>
<script type="text/javascript">
(function($){
    $(document).ready(function(){
        var div = "<div>foobar</div>";
        var span = '<span>span</span>';
        $('textarea[name="bio"]').focuus();
        $(form).on('submit', function(e){
            e.preventDefault();
            return false;
        }
    });
})(jQuery);
</script>
HTML;

运行它:

echo (new Minifier('textarea'))->minify($html);

这里有一个示例,您可以在这里尝试:Sandbox我实际上在其中添加了一点HTML,大声笑。但是,这种测试故意被弄乱了,超出了正常情况。

输出

<style type="text/css">
.body, div
{
    background-color: #CCC;
}
#someid
{
   color: #fff;
}
</style><p>Lorem ipsum dolor sit amet.</p><p>Lorem ipsum dolor sit amet.</p><p>This is a stupid p tag that has all kinds of extra space in it.</p><span id="foo">Insert title</span><form class="some class other another"><div class="title-box"><div class="title">Questions</div></div><div class="content"><div><span>Insert title</span><div><input name="question" placeholder="Insert some text here" type="text"/></div></div><div class="margin-t-10"><label>Insert BIO</label><div><textarea name="bio" class="textarea-content">This is first line text
This is second line text
more lines...</textarea></div></div><div class="description"><label>Insert description here</label><div><textarea data-something name="description" class="textarea-content other class">
Line one
line two
    have some tabulation here to keep...
another line...</textarea></div></div></div></form><script type="text/javascript">
(function($){
    $(document).ready(function(){
        var div = "<div>foobar</div>";
        var span = '<span>span</span>';
        $('textarea[name="bio"]').focuus();
        $(form).on('submit', function(e){
            e.preventDefault();
            return false;
        }
    });
})(jQuery);
</script>

在上面的输出中,script styletextarea并不是有意缩小的。缩小这些[script style]将需要额外的工作和规则,因为它们与HTML不同(它们本身就是语言)。这是默认的两个:

  protected $ignoreTags = [
     'script',
     'style'
 ];

然后textarea在构造函数new Minifier('textarea')中传递。对于textarea,其中的内容对空白非常敏感,因为它是“值”字段的一部分,因此我们不想修改它,这就是问题所在和原因。对您来说,这只是个不错的“奖金”。

我无法授予它在每种情况下都可以在每段HTML上使用的功能

但是欢迎您对其进行修改