Question

我正在使用simple_html_dom抓取一个网站，并且需要的结果介于 - ＆gt; innertext和 - ＆gt; plaintext之间。

例如，这是源字符串：

[28]                          The Canadian trade-marks regime is national in scope. The owner of a registered trade-mark, subject to a finding of invalidity, is entitled to the exclusive use of that mark in association with the wares or services to which it is connected throughout Canada. Section 19 of the Trade-marks Act provides:

我需要删除span标记，但不删除其内容（除非span仅包含 ），但保留，{{1} }和的

所以我想在这里实现的结果是一个字符串：

Answer 1

你可以试试这个。

echo stripcslashes('<span lang="EN-CA">[28]<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span><span lang="EN-CA">The Canadian trade-marks regime is national in scope. The owner of a registered trade-mark, subject to a finding of invalidity, is entitled to the exclusive use of that mark in association with the wares or services to which it is connected throughout Canada. Section 19 of the <i>Trade-marks Act</i> provides:</span>');

Answer 2

您可以尝试以下代码行：

<?php

$string = '<span lang="EN-CA">[28]<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span><span lang="EN-CA">The Canadian tr
ade-marks regime is national in scope. The owner of a registered trade-mark, subject to a finding of invalidity, is entitled to the exclusive u
se of that mark in association with the wares or services to which it is connected throughout Canada. Section 19 of the <i>Trade-marks Act</i> 
provides:</span>';

// Remove attributes within the <span> tag, just for clarity's sake.
$string = preg_replace('/(<span ([^\>]+)>)/i', '<span>', $string);

// Remove any spans that only contain &nbsp;
$string = preg_replace('/<span>([ ]|&nbsp;)*<\/span>/i', '', $string);

// Replace any consecutive span (opening or closing) tags with a space, to make
// clear the separation between one span and the next.
$string = preg_replace('/<(\/)?span><(\/)?span>/i', ' ', $string);

// Remove any remaining any instances of opening or closing span tags.
$string = preg_replace('/<(\/)?span>/i', '', $string);

print $string;

请注意，我在每个正则表达式的斜杠后面添加了一个i，这样可以进行不区分大小写的搜索。这是为了防止您有一些或甚至的代码。

当然，它不是一个紧密压缩的单行正则表达式代码真棒。但是，我这样做是为了让你可以看到沿途的步骤。您可以在整个print $string;行中查看进度。我希望这种向您展示代码的方式可以帮助您从长远来看，更好地了解正则表达式和preg_replace的使用方式。

Answer 3

这就是strip_tags的用途：

echo strip_tags('<span>strip me</span> <i>leave me alone</i>', '<i>');
//=> strip me <i>leave me alone</i>

只使用PHP在html字符串中保留一些标签

3 个答案: