Question

我的问题与此请求相同：Replace all quotes that are not in html-tags。

Howeaver，提供给该请求的解决方案并不适用于所有情况。示例：在此HTML片段中，第一个引用的文本匹配，但第二个不匹配。 Howeaver引号不是标记。

This quote is matched: "Hello, I'm a quote". But this it not: "I'm also a quite :(".

请在此处查看我的演示：http://regex101.com/r/tA0xV1

编辑：HTML经过严格验证。

Answer 1

我会这样做。

// your source (X)HTML
$data = '<p class="test"> This quote is matched: "Hello, I\'m a quote". But this it not: <i> </i> <i>"<strong>I\'m also a quite :(</strong>"</i>.</p>';

$r = new XMLReader;
$w = new XMLWriter;

$r->xml($data, 'UTF-8');
$w->openMemory();
$w->startDocument('1.0', 'UTF-8');

// for each node
while ($r->read()) {
    // depending on node type
    switch ($r->nodeType) {
        // if it's tag opening
        case XMLReader::ELEMENT:
            // write it "as is"
            $w->startElement($r->name);
            if ($r->moveToFirstAttribute()) {
                do {
                    $w->startAttribute($r->name);
                    $w->text($r->value);
                    $w->endAttribute();
                } while ($r->moveToNextAttribute());
            }
            break;
            // if it's </tag>, output "as is"
        case XMLReader::END_ELEMENT:
            $w->endElement();
            break;
        // if it's a plain text
        case XMLReader::TEXT:
        case XMLReader::SIGNIFICANT_WHITESPACE:
            // replace quotes
            $w->text(str_replace('"', '%QUOTE%', $r->value));
            break;
        // there're quite a lot of node types
        default:
            throw new Exception("Unknown node type: " . $r->nodeType);
    }
}

// output the transformed XML    
$w->endDocument();
echo $w->outputMemory(TRUE);

有关详细信息，请参阅XMLReader和XMLWriter文档。 Live version here.

Answer 2

我回答自己，这个正则表达式适用于我的提案，确保第一个双引号前面没有=字符：

/(?![^<]*>)"(.*?)(?<!=)"(?!>)/g

在这里演示：http://regex101.com/r/vT1eZ8

替换HTML中的双引号，避免使用标记

2 个答案: