我的问题与此请求相同:Replace all quotes that are not in html-tags。
Howeaver,提供给该请求的解决方案并不适用于所有情况。示例:在此HTML片段中,第一个引用的文本匹配,但第二个不匹配。 Howeaver引号不是标记。
<p>This quote is matched: "Hello, I'm a quote". <strong>But this it not: <i>"<strong>I'm also a quite :(</strong>"</i>.</p>
请在此处查看我的演示:http://regex101.com/r/tA0xV1
编辑:HTML经过严格验证。
答案 0 :(得分:1)
我会这样做。
// your source (X)HTML
$data = '<p class="test"> This quote is matched: "Hello, I\'m a quote". But this it not: <i> </i> <i>"<strong>I\'m also a quite :(</strong>"</i>.</p>';
$r = new XMLReader;
$w = new XMLWriter;
$r->xml($data, 'UTF-8');
$w->openMemory();
$w->startDocument('1.0', 'UTF-8');
// for each node
while ($r->read()) {
// depending on node type
switch ($r->nodeType) {
// if it's tag opening
case XMLReader::ELEMENT:
// write it "as is"
$w->startElement($r->name);
if ($r->moveToFirstAttribute()) {
do {
$w->startAttribute($r->name);
$w->text($r->value);
$w->endAttribute();
} while ($r->moveToNextAttribute());
}
break;
// if it's </tag>, output "as is"
case XMLReader::END_ELEMENT:
$w->endElement();
break;
// if it's a plain text
case XMLReader::TEXT:
case XMLReader::SIGNIFICANT_WHITESPACE:
// replace quotes
$w->text(str_replace('"', '%QUOTE%', $r->value));
break;
// there're quite a lot of node types
default:
throw new Exception("Unknown node type: " . $r->nodeType);
}
}
// output the transformed XML
$w->endDocument();
echo $w->outputMemory(TRUE);
有关详细信息,请参阅XMLReader和XMLWriter文档。 Live version here.
答案 1 :(得分:-2)
我回答自己,这个正则表达式适用于我的提案,确保第一个双引号前面没有=
字符:
/(?![^<]*>)"(.*?)(?<!=)"(?!>)/g