Question

我正在努力使用正则表达式。这是我正在撰写的文字：

* [[February 1]] – ''[[Brave New World]]'', a novel by [[Aldous Huxley]], is first published.
* [[February 2]]
** A general [[World Disarmament Conference]] begins in [[Geneva]]. The principal issue at the conference is the demand made by Germany for ''gleichberechtigung'' ("equality of status" i.e. abolishing Part V of the Treaty of Versailles, which had disarmed Germany) and the French demand for ''sécurité'' ("security" i.e. maintaining Part V).
** The [[League of Nations]] again recommends negotiations between the [[Republic of China (1912–49)|Republic of China]] and Japan.
** The [[Reconstruction Finance Corporation]] begins operations in Washington, D.C.
* [[February 4]]
** The [[1932 Winter Olympics]] open in [[Lake Placid, New York]].
** Japan occupies [[Harbin]], China.
* [[February 9]] – [[Junnosuke Inoue]], prominent Japanese businessman, banker and former governor of the Bank of Japan is assassinated by right-wing extremist group the League of Blood in the [[League of Blood Incident]].
* [[February 11]] – [[Pope Pius XI]] meets [[Benito Mussolini]] in [[Vatican City]].

我希望有一个正则表达式来匹配以*开头的所有行，后跟任意数量的以**开头的行。理想情况下，我希望在一个组中包含**的每一行。

以下是我想要的结果：

> Match 1:
>> Group 1: "* [[February 2]]"

>> Group 2: "** A general [...] Part V)."

>> Group 3: "** The [[League of Nations]] [...] and Japan."

>> Group 4: "** The [[Reconstruction Finance Corporation]] begins operations in Washington, D.C."

> Match 2: 
>> Group 1: "* [[February 4]]"

>> Group 2: "** The [[1932 Winter Olympics]] open in [[Lake Placid, New York]]."

>> Group 3: "** Japan occupies [[Harbin]], China."

（我已将[......]用于缩短目的。）

这是我来到这个模式：/(*ANY)^\*{1} (.*)\n(?>(^\*{2}(.*?)\n)+)/gm，这里是regex101的链接，我在那里测试我的正则表达式：https://regex101.com/r/ubtnMg/1。

以下是我的模式的说明： * (*ANY)匹配任何换行序列，因为我不确定他们在文本中使用哪个换行符。 * ^\*{1} (.*)\n匹配以*开头的任何行，捕获该行的文本，直到有换行符。 * (?>(^\*{2}(.*?)\n)+)是棘手的部分。它应该匹配以^\*{1} (.*)\n开头的**之后的每一行，捕获文本直到组中的行尾，直到找到以*开头的新行

它实际上给了我这个：

> Match 1: "* [[February 2]]
** A general [[World Disarmament Conference]] begins in [[Geneva]]. The principal issue at the conference is the demand made by Germany for ''gleichberechtigung'' ("equality of status" i.e. abolishing Part V of the Treaty of Versailles, which had disarmed Germany) and the French demand for ''sécurité'' ("security" i.e. maintaining Part V).
** The [[League of Nations]] again recommends negotiations between the [[Republic of China (1912–49)|Republic of China]] and Japan.
** The [[Reconstruction Finance Corporation]] begins operations in Washington, D.C."
>> Group 1: "[[February 2]]"

>> Group 2: "** The [[Reconstruction Finance Corporation]] begins operations in Washington, D.C."

>> Group 3: "The [[Reconstruction Finance Corporation]] begins operations in Washington, D.C."

> Match 2: "* [[February 4]]
** The [[1932 Winter Olympics]] open in [[Lake Placid, New York]].
** Japan occupies [[Harbin]], China."
>> Group 1: "[[February 4]]"

>> Group 2: "** Japan occupies [[Harbin]], China"

>> Group 3: " Japan occupies [[Harbin]], China."

我希望我已经足够清楚，你可以帮助我。不要犹豫，询问更多细节。

Answer 1

感谢Rawing的评论，我发现了这个解决方案：

首先，我使用这种模式：/(*ANY)^\*{1} (.*)\n(^\*{2}(.*?)\n)+/gm来匹配每个文本块，如下所示：

* [[February 2]]
** A general [[World Disarmament Conference]] begins in [[Geneva]]. The principal issue at the conference is the demand made by Germany for ''gleichberechtigung'' ("equality of status" i.e. abolishing Part V of the Treaty of Versailles, which had disarmed Germany) and the French demand for ''sécurité'' ("security" i.e. maintaining Part V).
** The [[League of Nations]] again recommends negotiations between the [[Republic of China (1912–49)|Republic of China]] and Japan.
** The [[Reconstruction Finance Corporation]] begins operations in Washington, D.C.

然后我使用此模式获取以*开头的行：/^\*{1}(.*)/g。我还使用此模式获取以**开头的每一行：/^\*{2}(.*)$/gm

递归正则表达式模式

1 个答案: