正则表达式标签忽略tr标签内的所有td标签

时间:2014-10-22 21:19:24

标签: html regex tags

这个问题有可能解决方案吗?

我想要一个忽略tr标签中所有td标签的正则表达式。 我正在寻找的tr标签是不正确的,因为结束标签缺少 “/”。到目前为止,我有:

<tr[^>]*><td(?:(?!</td>).)*</td><tr[^>]*>

<tr[^>]*> This needs to be the beginning of the expression ****

<td(?:(?!</td>).)*</td> This will find everything between <td> and </td>

<tr[^>]*> This needs to be the end of the expression ****

这个正则表达式当然不起作用。以下是运行正则表达式的文本示例:

样本1:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
   <title>                  
   </title>
</head>                             
<body>
   <table asdf>
      <tr asdf>
         <td asdf>
            <table asdf>
                <tr asdf: asdf>
                   <td>
                       blah blah blah
                   </td>
               </tr>
            </table>
          </td>
          <td>
              Keep going
          </td>
      <tr> If highlighted to here from first tr tag than correct regex was used
  </table>
</body>
</html>

样本2:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
   <title>                  
   </title>
</head>                             
<body>
   <table asdf>
      <tr asdf>
         <td asdf>
            <table asdf>
                <tr asdf: asdf>
                   <td>
                       blah blah blah
                   </td>
               </tr>
            </table>
          </td>
          <td>
              <table asdf>
                <tr asdf: asdf>
                   <td>
                       blah blah blah
                   </td>
               </tr>
            </table>
          </td>
      <tr> If highlighted to here from first tr tag than correct regex was used
  </table>
</body>
</html>

样本3:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
   <title>                  
   </title>
</head>                             
<body>
   <table asdf>
      <tr asdf>
         <td asdf>
            <table asdf>
                <tr asdf: asdf>
                   <td>
                       blah blah blah
                   </td>
               </tr>
            </table>
          </td>
          <td>
              <table>
                <tr>
                   <td>
                       blah blah blah
                   </td>
               </tr>
            </table>
          </td>
      <tr> If highlighted to here from first tr tag than correct regex was used
  </table>
</body>
</html>

样本4:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
   <title>                  
   </title>
</head>                             
<body>

<table>
    <tr>
        <td>&nbsp;</td>
    </tr>
</table>
<br/>
<br/>
<br/>
<table class="afdadsf">
    <td></td>
</table>
<br/>
<br/>
<table class="fdafdas">
    <tr><td></td>
            </tr>
    </table>
</body>
</html>

我想要的输出是在执行正则表达式时,使用上面的两个示例文本突出显示第一个tr标记直到最后一个tr标记。假设其他示例文本中td标记可能包含任何值。

1 个答案:

答案 0 :(得分:0)

根据发布和请求的内容,如果您的Regex引擎支持递归(?R),请使用以下模式:

<tr[^>]*>.*(<(\S+)[^>]*>([^<]|(?1))*?<\/\2>).*?<tr[^>]*>  

可能需要进行一些广泛的测试 Demo


根据以下评论<tr>标记始终位于最外层,请启用此模式s选项:

(<tr[^>]*>.*<tr>)

Demo