PHP正则表达式删除标记内的所有内容

时间:2017-12-15 05:21:52

标签: php regex string anchor

我有一个包含锚标签的字符串。那些锚标签包含一些html和文本,如下所示:

<a class="content-title-link" title="Blog" href="https://example.com/my-blog" target="_blank">
 <img id="my_main_pic" class="content-title-main-pic" src="https://example.com/xyz.jpg" width="30px" height="30px" alt="Main Profile Picture">
 My HTML Link 
 <label>Click here to view 
  <cite class="glyphicon glyphicon-new-window" title="Blog"></cite>
 </label>
</a>

我的字符串就像:

<p>Hello there,</p>
<p><a class="content-title-link" title="Blog" href="https://example.com/my-blog" target="_blank">
     <img id="my_main_pic" class="content-title-main-pic" src="https://example.com/xyz.jpg" width="30px" height="30px" alt="Main Profile Picture">
     My HTML Link 
     <label>Click here to view 
      <cite class="glyphicon glyphicon-new-window" title="Blog"></cite>
     </label>
    </a>
    what's up.
    </p>
<p>
Click here <a class="content-title-link" title="Blog" href="https://example.com/my-blog" target="_blank">
     <img id="my_main_pic" class="content-title-main-pic" src="https://example.com/xyz.jpg" width="30px" height="30px" alt="Main Profile Picture">
     My HTML Link 
     <label>Click here to view 
      <cite class="glyphicon glyphicon-new-window" title="Blog"></cite>
     </label>
    </a> to view my pic.
</p>

我必须用字符串中的href替换锚标记,这样字符串就像:

<p>Hello there,</p>
<p>https://example.com/my-blog
    what's up.
    </p>
<p>
Click here https://example.com/my-blog to view my pic.
</p>

我试过下面的代码,但它没有用它的href替换标签:

$dom = new DomDocument();
$dom->loadHTML( $text );
$matches = array();
foreach ( $dom->getElementsByTagName('a') as $item ) {
   $matches[] = array (
      'a_tag' => $dom->saveHTML($item),
      'href' => $item->getAttribute('href'),
      'anchor_text' => $item->nodeValue
   );
}

foreach( $matches as $match )
{
  // Replace a tag by its href
  $text = str_replace( $match['a_tag'], $match['href'], $text );
}

return $text;

有没有人知道是否可以这样做。

2 个答案:

答案 0 :(得分:1)

我们可以尝试使用正则表达式。用捕获组替换以下模式:

<a.*?href="([^"]*)".*?>.*?<\/a>

使用preg_replace我们可以重复匹配上述模式,并将锚标记替换为标记内的捕获href网址。

$result = preg_replace('/<a.*?href="([^"]*)".*?>.*?<\/a>/s', '$1', $string);

请注意s末尾的/pattern/s标记。这在DOT ALL模式下进行替换,这意味着dot也会匹配换行符(即跨行,这就是你想要的)。

Demo

答案 1 :(得分:0)

搜索此正则表达式:

<a.*?href="([^"]*)"[^>]*>

并将其替换为

$1