替换HTML中的双引号,避免使用标记

时间:2014-06-02 10:55:12

标签: html regex

我的问题与此请求相同:Replace all quotes that are not in html-tags

Howeaver,提供给该请求的解决方案并不适用于所有情况。示例:在此HTML片段中,第一个引用的文本匹配,但第二个不匹配。 Howeaver引号不是标记。

<p>This quote is matched: "Hello, I'm a quote". <strong>But this it not: <i>"<strong>I'm also a quite :(</strong>"</i>.</p>

请在此处查看我的演示:http://regex101.com/r/tA0xV1

编辑:HTML经过严格验证。

2 个答案:

答案 0 :(得分:1)

我会这样做。

// your source (X)HTML
$data = '<p class="test"> This quote is matched: "Hello, I\'m a quote". But this it not: <i> </i> <i>"<strong>I\'m also a quite :(</strong>"</i>.</p>';

$r = new XMLReader;
$w = new XMLWriter;

$r->xml($data, 'UTF-8');
$w->openMemory();
$w->startDocument('1.0', 'UTF-8');

// for each node
while ($r->read()) {
    // depending on node type
    switch ($r->nodeType) {
        // if it's tag opening
        case XMLReader::ELEMENT:
            // write it "as is"
            $w->startElement($r->name);
            if ($r->moveToFirstAttribute()) {
                do {
                    $w->startAttribute($r->name);
                    $w->text($r->value);
                    $w->endAttribute();
                } while ($r->moveToNextAttribute());
            }
            break;
            // if it's </tag>, output "as is"
        case XMLReader::END_ELEMENT:
            $w->endElement();
            break;
        // if it's a plain text
        case XMLReader::TEXT:
        case XMLReader::SIGNIFICANT_WHITESPACE:
            // replace quotes
            $w->text(str_replace('"', '%QUOTE%', $r->value));
            break;
        // there're quite a lot of node types
        default:
            throw new Exception("Unknown node type: " . $r->nodeType);
    }
}

// output the transformed XML    
$w->endDocument();
echo $w->outputMemory(TRUE);

有关详细信息,请参阅XMLReaderXMLWriter文档。 Live version here.

答案 1 :(得分:-2)

我回答自己,这个正则表达式适用于我的提案,确保第一个双引号前面没有=字符:

/(?![^<]*>)"(.*?)(?<!=)"(?!>)/g

在这里演示:http://regex101.com/r/vT1eZ8