使用PHP删除表内的<br/>

时间:2014-03-19 15:17:59

标签: php regex string

我想使用PHP删除表中的所有<br />。我知道我可以使用str_replace()删除<br />。但它将删除所有<br />。我只想删除<br /><table>之间的</table>。我在一个字符串中有几个表。

html代码如下。您也可以看到this fiddle

<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br />       <tr><br />          <td><br />          <p><strong>column1</strong></p>         </td><br />         <td><br />          <p><strong>column2</strong></p>         </td></tr><br />        <tr><br />          <td><br />          <p>1</p>            </td><br />         <td><br />          <p>2</p>            </td><br />         <br />      </tr><br /> </tbody><br /></table>

<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br />       <tr><br />          <td><br />          <p><strong>column1</strong></p>         </td><br />         <td><br />          <p><strong>column2</strong></p>         </td></tr><br />        <tr><br />          <td><br />          <p>1</p>            </td><br />         <td><br />          <p>2</p>            </td><br />         <br />      </tr><br /> </tbody><br /></table>

我尝试了以下方法来做到这一点,这是最好的解决方案吗?

<?php
    $input = '<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br />       <tr><br />          <td><br />          <p><strong>column1</strong></p>         </td><br />         <td><br />          <p><strong>column2</strong></p>         </td></tr><br />        <tr><br />          <td><br />          <p>1</p>            </td><br />         <td><br />          <p>2</p>            </td><br />         <br />      </tr><br /> </tbody><br /></table>

<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br />       <tr><br />          <td><br />          <p><strong>column1</strong></p>         </td><br />         <td><br />          <p><strong>column2</strong></p>         </td></tr><br />        <tr><br />          <td><br />          <p>1</p>            </td><br />         <td><br />          <p>2</p>            </td><br />         <br />      </tr><br /> </tbody><br /></table>';


$body = preg_replace_callback("~<table\b.*?/table>~si", "process_table", $input);

function process_table($match) {

        return str_replace('<br />', '', $match[0]);

}

echo $body;

2 个答案:

答案 0 :(得分:1)

如上所述here,&#34;正则表达式不是可用于正确解析HTML&#34;的工具。但是,为了给出一个可以解决这个受控案例的解决方案,我提交以下内容。它包括显示之前和之后的调试代码。

注意:我还测试了您的正则表达式,它与/<table\b.*?<\/table>/si

中的preg_match()一样有效
<?php

$search ='<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br />       <tr><br />          <td><br />          <p><strong>column1</strong></p>         </td><br />         <td><br />          <p><strong>column2</strong></p>         </td></tr><br />        <tr><br />          <td><br />          <p>1</p>            </td><br />         <td><br />          <p>2</p>            </td><br />         <br />      </tr><br /> </tbody><br /></table>

<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br />       <tr><br />          <td><br />          <p><strong>column1</strong></p>         </td><br />         <td><br />          <p><strong>column2</strong></p>         </td></tr><br />        <tr><br />          <td><br />          <p>1</p>            </td><br />         <td><br />          <p>2</p>            </td><br />         <br />      </tr><br /> </tbody><br /></table>';

$search = replacebr($search);

function replacebr($search){
        $offset=0;
        $anew=array();
        $asearch=array();
        $notdone = 1;
        $i=0;

    echo $search;

        while ($notdone == 1) {
            ($notdone = preg_match('/<table\s[^>]*>(.+?)<\/table>/', $search, $amatch, PREG_OFFSET_CAPTURE, $offset));
            if (count($amatch)>0){
echo "amatch: " ; var_dump($amatch);
                // add part before match
                $anew[] = substr($search,$offset,$amatch[0][1]-$offset);

echo "anew (before): " ; var_dump($anew[count($anew)-1]);
                // add match with replaced text
                $anew[] = str_replace("<br />","",$amatch[0][0]);
echo "anew (match): " ; var_dump($anew[count($anew)-1]);

                $offset += mb_strlen(substr($search,$offset,$amatch[0][1]-$offset))+ mb_strlen($amatch[0][0]);
echo "OFFSET: " ; var_dump($offset);

            }
            else{
                // add last part of string - we better be done
                $anew[] = substr($search, $offset);
                $search=="";
                if ($notdone == 1){
                    die("Error - should be done");
                }
            }
            if ($i==100){
                // prevent endless loop
                die("Endless Loop");
            }
            $i++;
        }
        $new = implode("",$anew);
            echo "*******************";
            echo $new;
        return $new;
    }


?>

答案 1 :(得分:0)

不建议使用正则表达式解析html,但是如果必须使用 这可能会奏效。

注意 - 测试用例是在perl中,但正则表达式可以在php中使用 只需全局替换为$1

 #  '~(?s)((?:(?!\A|<table\b)\G|<table\b)(?:(?!<br\s*/>|</table\b).)*)<br\s*/>(?=.*?</table\b)~'

 (?s)                         # Dot-All
 (                            # (1 start), Keep these
      (?:
           (?! \A | <table \b )
           \G                           # Start match from end of last match
        |                               # or,
           <table \b                    # Start form '<table\b'
      )
      (?:                          # The chars before <br/ or </table  end tags
           (?!
                <br \s* /> 
             |  </table \b 
           )
           . 
      )*
 )                            # (1 end)
 <br \s* />                   # Strip <br/>
 (?= .*? </table \b )         # Must be </table end tag downstream

Perl测试用例

$/ = undef;

$str = <DATA>;

print "Before:\n$str\n\n";
$str =~ s~(?s)((?:(?!\A|<table\b)\G|<table\b)(?:(?!<br\s*/>|</table\b).)*)<br\s*/>(?=.*?</table\b)~$1~g;
print "After:\n$str\n\n";

__DATA__
<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br />       <tr><br />          <td><br />          <p><strong>column1</strong></p>         </td><br />         <td><br />          <p><strong>column2</strong></p>         </td></tr><br />        <tr><br />          <td><br />          <p>1</p>            </td><br />         <td><br />          <p>2</p>            </td><br />         <br />      </tr><br /> </tbody><br /></table>

输出&gt;&gt;

Before:
<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br />       <tr><br />          <td><br />          <p><strong>column1</strong></p>         </td><br />         <td><br />          <p><strong>column2</strong></p>         </td></tr><br />        <tr><br />          <td><br />          <p>1</p>            </td><br />         <td><br />          <p>2</p>            </td><br />         <br />      </tr><br /> </tbody><br /></table>

After:
<p>Some text before table:</p><table cellpadding="0" cellspacing="0"> <tbody>       <tr>          <td>          <p><strong>column1</strong></p>         </td>         <td>          <p><strong>column2</strong></p>         </td></tr>        <tr>          <td>          <p>1</p>            </td>         <td>          <p>2</p>            </td>               </tr> </tbody></table>