我正在使用simple_html_dom抓取一个网站,并且需要的结果介于 - > innertext和 - > plaintext之间。
例如,这是源字符串:
<span lang="EN-CA">[28]<span style="font:7.0pt "Times New Roman"">
</span></span><span lang="EN-CA">The Canadian trade-marks regime is national in
scope. The owner of a registered trade-mark, subject to a finding of
invalidity, is entitled to the exclusive use of that mark in association with
the wares or services to which it is connected throughout Canada. Section 19 of
the <i>Trade-marks Act</i> provides:</span>
我需要删除span
标记,但不删除其内容(除非span
仅包含
),但保留<i>
,{{1} }和<u>
的
所以我想在这里实现的结果是一个字符串:
<b>
答案 0 :(得分:0)
echo stripcslashes('<span lang="EN-CA">[28]<span style="font:7.0pt "Times New Roman""> </span></span><span lang="EN-CA">The Canadian trade-marks regime is national in scope. The owner of a registered trade-mark, subject to a finding of invalidity, is entitled to the exclusive use of that mark in association with the wares or services to which it is connected throughout Canada. Section 19 of the <i>Trade-marks Act</i> provides:</span>');
答案 1 :(得分:0)
您可以尝试以下代码行:
<?php
$string = '<span lang="EN-CA">[28]<span style="font:7.0pt "Times New Roman""> &n
bsp; </span></span><span lang="EN-CA">The Canadian tr
ade-marks regime is national in scope. The owner of a registered trade-mark, subject to a finding of invalidity, is entitled to the exclusive u
se of that mark in association with the wares or services to which it is connected throughout Canada. Section 19 of the <i>Trade-marks Act</i>
provides:</span>';
// Remove attributes within the <span> tag, just for clarity's sake.
$string = preg_replace('/(<span ([^\>]+)>)/i', '<span>', $string);
// Remove any spans that only contain
$string = preg_replace('/<span>([ ]| )*<\/span>/i', '', $string);
// Replace any consecutive span (opening or closing) tags with a space, to make
// clear the separation between one span and the next.
$string = preg_replace('/<(\/)?span><(\/)?span>/i', ' ', $string);
// Remove any remaining any instances of opening or closing span tags.
$string = preg_replace('/<(\/)?span>/i', '', $string);
print $string;
请注意,我在每个正则表达式的斜杠后面添加了一个i
,这样可以进行不区分大小写的搜索。这是为了防止您有一些<SPAN>
或<span>
甚至<SpaN>
的代码。
当然,它不是一个紧密压缩的单行正则表达式代码真棒。但是,我这样做是为了让你可以看到沿途的步骤。您可以在整个print $string;
行中查看进度。我希望这种向您展示代码的方式可以帮助您从长远来看,更好地了解正则表达式和preg_replace
的使用方式。
答案 2 :(得分:0)
这就是strip_tags的用途:
echo strip_tags('<span>strip me</span> <i>leave me alone</i>', '<i>');
//=> strip me <i>leave me alone</i>