Question

我在包含HTML的锚中有title=""属性。我试图完全删除title属性，但无论出于什么原因，我使用的preg替换都不起作用。我试过了：

$output = preg_replace( '/title=\"(.*?)\"/',  '', $output );
$output = preg_replace( '/\title="(.*?)"/',   '', $output );
$output = preg_replace( '` title="(.+)"`',    '', $output );

以上都没有，但我可以使用类似的东西：

$output = str_replace( 'title', 'class', $output );

只是为了证明我能够做某事（而且我没有上传错误的文件或其他东西）。输出如下：

<a href="#" title="<table border=\&quot;0\&quot; width=\&quot;100%\&quot; cellspacing=\&quot;0\&quot; cellpadding=\&quot;0\&quot;>
    <tbody>
        <tr>
            <td colspan=\&quot;2\&quot; align=\&quot;center\&quot; valign=\&quot;top\&quot;></td>
        </tr>
        <tr>
            <td valign=\&quot;top\&quot; width=\&quot;50%\&quot;>
            table content
            </td>
            <td valign=\&quot;top\&quot; width=\&quot;50%\&quot;>
            table content
            </td>
        </tr>
    </tbody>
</table>">Link Title</a>

所以我要做的就是过滤$output并完全删除title属性，包括title属性中的所有内容。为什么上面的preg_replace()不起作用，我的选择是什么？

Answer 1

我不使用正则表达式对[x] html进行操作，我会使用html解析器。

但是如果你仍然想使用正则表达式，那么你可以使用这样的正则表达式：

title="[\s\S]*?"

<强> Working demo

enter image description here

您可以拥有以下代码：

$re = "/title=\"[\\s\\S]*?\"/"; 
$str = "<a href=\"#\" title=\"<table border=\&quot;0\&quot; width=\&quot;100%\&quot; cellspacing=\&quot;0\&quot; cellpadding=\&quot;0\&quot;>\n    <tbody>\n        <tr>\n            <td colspan=\&quot;2\&quot; align=\&quot;center\&quot; valign=\&quot;top\&quot;></td>\n        </tr>\n        <tr>\n            <td valign=\&quot;top\&quot; width=\&quot;50%\&quot;>\n            table content\n            </td>\n            <td valign=\&quot;top\&quot; width=\&quot;50%\&quot;>\n            table content\n            </td>\n        </tr>\n    </tbody>\n</table>\">Link Title</a>"; 
$subst = ""; 

$result = preg_replace($re, $subst, $str);

更新：您可以在 Andrei P. 评论

中看到一个明确的示例，说明为什么不应该使用正则表达式解析html

Preg_replace锚点标题属性

1 个答案: