Question

我不习惯正则表达式，所以这对我来说似乎很容易。

基本上，我正在将wordwrap应用于内容，其中包含经典的html标签：，...

  $text = wordwrap($text, $cutLength, " ", $wordCut);
  $text = nl2br(bbcode_parser($text));
  return $text;

正如你所看到的，我的问题非常简单：我想要的只是将wordwrap（）应用于我的内容，不包括html属性中的内容：href，src ......

有人可以帮帮我吗？非常感谢！

Answer 1

使用文档中的any DOM parser capable of extracting the text nodes。迭代文本节点，对它们应用wordwrap并将它们写回各自的文本节点。

该方法与

中给出的方法相同

How to replace text URLs and exclude URLs in HTML tags?

而不是检查链接的文本内容，而是在wordwrap上应用它们。

您的问题的更一般的措辞是：“如何（有选择地）获取HTML文档的文本内容以向其应用函数”

Answer 2

当然，你不应该使用正则表达式进行html解析，但这应该分开你想要的内容。我对php的知识有限，所以这只是说明了程序。

$tags = 
'  <
   (?:
       /?\w+\s*/?
     | \w+\s+ (?:".*?"|\'.*?\'|[^>]*?)+\s*/?
     | !(?:DOCTYPE.*?|--.*?--)
   )>
';

$scripts =
'   <
   (?:
       (?:script|style) \s*
     | (?:script|style) \s+ (?:".*?"|\'.*?\'|[^>]*?)+\s*
   )>
   .*?
   </(?:script|style)\s*>
';

$regex = / ($scripts | $tags) | ((?:(?!$tags).)+) /xsg;

替换字符串是Group1 catted到你的返回值自动换行功能（传递内容，Group2字符串）所以类似：replacement = \ 1。 textwrap（\ 2）
在textwrap中你决定如何处理内容。

在Perl中测试过（顺便说一句，它的速度非常慢，为了清晰起见而淡化）：

use strict;
use warnings;

my $tags = 
'  <
   (?:
       /?\w+\s*/?
     | \w+\s+ (?:".*?"|\'.*?\'|[^>]*?)+\s*/?
     | !(?:DOCTYPE.*?|--.*?--)
   )>
';

my $scripts =
'   <
   (?:
       (?:script|style) \s*
     | (?:script|style) \s+ (?:".*?"|\'.*?\'|[^>]*?)+\s*
   )>
   .*?
   </(?:script|style)\s*>
';

my $html = join '', <DATA>;

while ( $html =~ / ($scripts | $tags) | ((?:(?!$tags).)+) /xsg ) {
    if (defined $2 && $2 !~ /^\s+$/) {
        print $2,"\n";
    }
}

将wordwrap应用于html内容，不包括html属性

2 个答案: