Question

我需要为每个单词提取句子或段落，并将单词操作为另一种形式。例如，我需要将' noodle '的字词更改为' ## noodle ## '。我正在使用此代码将句子分解为单词并使用implode()函数进行组合。

function display_sentence_with_answer($str="")
{
    $arr_output = [];
    $str = preg_replace("#<p>(\s| |</?\s?br\s?/)*</?p>#","",$str);
    $words = preg_replace('#<[^>]+>#', ' ', $str);
    $arr_words = preg_split('/<[^>]+>(?:\s+<[^>]+>)*|\s+/u', trim($words));

    foreach($arr_words as $word)
    {
        $arr_output[] = '##'.$word.'##';
    }
    $output_str = implode(" ",$arr_output);
    return $output_str;
}

输入：

Nyatakan pecahan bagi rajah di bawah.

<br/>
4/5


<p>
p</p>

然而，我得到了输出：

##Nyatakan## ##pecahan## ##bagi## ##rajah## ##di## ##bawah.## ##4/5## ##p##

如何恢复当前的输入格式？有人遇到过这种要求吗？

我的预期输出是：

##Nyatakan## ##pecahan## ##bagi## ##rajah## ##di## ##bawah.##

<br/>
##4/5##


<p>
p</p>

谢谢！

Answer 1

您可以将此正则表达式与PCRE动词(*SKIP)(*F)一起使用，以跳过某些匹配项：

(?:<([^>]*)>.*?</\1>|<[^>]*/>)(*SKIP)(*F)|\b\w\S*

RegEx Demo

RegEx分手：

(?:                   # start non capturing group
   <([^>]*)>.*?</\1>  # match a tag and closing tag <tag>...</tag>
   |                  # OR
   <[^>]*/>           # match a tag like <tag/>
)                     # end non capturing group
(*SKIP)(*F)           # skip this match
|                     # OR
\b\w\S*               # match a word starting with a word character

警告： HTML不是常规语言，它可能非常难以预测，因此不建议使用正则表达式解析HTML。

如何在操作后恢复字符串格式

1 个答案: