Question

我想从字符串中找到段落，然后对它们进行格式化，我有什么工作，但它不能100％工作。

所以，我有这个字符串，如下所示：

##Chapter 1

Once upon a time there was a little girl named sally, she went to school.

One day it was awesome!

##Chapter 2

We all had a parade!

我通过将##...转换为<H2>来格式化字符串，现在看起来像这样：

<h2>Chapter 1</h2>

Once upon a time there was a little girl named sally, she went to school.

One day it was awesome!

<h2>Chapter 2</h2>

We all had a parade!

现在我想将所有内容转换为段落，为此我这样做：

// Converts sections to paragraphs:
$this->string = preg_replace("/(^|\n\n)(.+?)(\n\n|$)/", "<p>$2</p>", $this->string);

// To Remove paragraph tags from header tags (h1,h2,h3,h4,h5,h6,h7):
$this->string = preg_replace("/<p><h(\d)>(.+?)<\/h\d><\/p>/i", "<h$1>$2</h$1>", $this->string);

这是最终输出（添加新行以提高可读性）：

<h2>Chapter 1</h2>
Once upon a time there was a little girl named sally, she went to school.
<p>One day it was awesome!</p>
<h2>Chapter 2</h2>
<p>We all had a parade!</p>

正如我在开头附近所说的那样，这不起作用100％，并且你可以看到第一段没有添加段落。我该怎么做才能改善正则表达式？

Answer 1

你可以一步完成：

$this->string = preg_replace('~(*BSR_ANYCRLF)\R\R\K(?>[^<\r\n]++|<(?!h[1-6]\b)|\R(?!\R))+(?=\R\R|$)~u',
                             '<p>$0</p>', $this->string);

模式详情

(*BSR_ANYCRLF)       # \R can be any type of newline
\R\R                 # two newlines
\K                   # reset the match
(?>                  # open an atomic group
    [^<\r\n]++       # all characters except <, CR, LF
  |                  # OR
    <(?!h[1-6]\b)    # < not followed by a header tag
  |                  # OR
    \R(?!\R)         # single newline
)+                   # close the atomic group and repeat one or more times
(?=\R\R|$)           # followed by to newlines or the end of the string

Answer 2

将m开关添加到第一个正则表达式。

// Converts sections to paragraphs:
$this->string = preg_replace("/(^|\n\n)(.+?)(\n\n|$)/m", "<p>$2</p>", $this->string);

// To Remove paragraph tags from header tags (h1,h2,h3,h4,h5,h6,h7):
$this->string = preg_replace("/<p><h(\d)>(.+?)<\/h\d><\/p>/i", "<h$1>$2</h$1>", $this->string);

将多个新行转换为段落

2 个答案: