从字符串文件中删除元标记中的文本

时间:2017-02-06 15:53:18

标签: php regex

我想重置所有这些元素,以便它们具有空内容属性。如果我知道内容设置为什么,我有一个正则表达式可以做到这一点。这是我的例子:

$string = preg_replace('/<meta content="website"[^>]+>/', '<meta content="website" property="og:type">',$stringFile);

当前的元话:

<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<meta content="dynamic text here" property="og:title">
<meta content="lots of text ... lots of text ... lots of text " property="og:description">
<meta content="website" property="og:type">
<meta content="version" property="og:url">
<meta content="/folder/folder/folder/folder/logo.jpg" property="og:image">

所需的输出(注意text / html保持不变):

<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<meta content="" property="og:title">
<meta content="" property="og:description">
<meta content="" property="og:type">
<meta content="" property="og:url">
<meta content="" property="og:image">

1 个答案:

答案 0 :(得分:1)

有些事情......

<?php


$html = '<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
    <meta content="dynamic text here" property="og:title">...';

$domd = new DOMDocument();
libxml_use_internal_errors(true);
$domd->loadHTML($html);
libxml_use_internal_errors(false);

$domx = new DOMXPath($domd);
$items = $domx->query("//meta[@content]");

foreach($items as $item) {
  if (strpos($item->getAttribute('content'),'text/html') !== false) continue;
  $item->removeAttribute("content");
}

echo $domd->saveHTML();

......应该做的伎俩。避免使用正则表达式来操纵html。

或者,如果条件使用正则表达式,如Toto所建议的那样:

<?php


$html = '<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
    <meta content="dynamic text here" property="og:title">...';

$domd = new DOMDocument();
libxml_use_internal_errors(true);
$domd->loadHTML($html);
libxml_use_internal_errors(false);

$domx = new DOMXPath($domd);
$items = $domx->query("//meta[@content]");

foreach($items as $item) {
  if (preg_match('~\btext/html\b~',$item->getAttribute('content'))) continue;
  $item->removeAttribute("content");
}

echo $domd->saveHTML();