我正在努力使用正则表达式。这是我正在撰写的文字:
* [[February 1]] – ''[[Brave New World]]'', a novel by [[Aldous Huxley]], is first published.
* [[February 2]]
** A general [[World Disarmament Conference]] begins in [[Geneva]]. The principal issue at the conference is the demand made by Germany for ''gleichberechtigung'' ("equality of status" i.e. abolishing Part V of the Treaty of Versailles, which had disarmed Germany) and the French demand for ''sécurité'' ("security" i.e. maintaining Part V).
** The [[League of Nations]] again recommends negotiations between the [[Republic of China (1912–49)|Republic of China]] and Japan.
** The [[Reconstruction Finance Corporation]] begins operations in Washington, D.C.
* [[February 4]]
** The [[1932 Winter Olympics]] open in [[Lake Placid, New York]].
** Japan occupies [[Harbin]], China.
* [[February 9]] – [[Junnosuke Inoue]], prominent Japanese businessman, banker and former governor of the Bank of Japan is assassinated by right-wing extremist group the League of Blood in the [[League of Blood Incident]].
* [[February 11]] – [[Pope Pius XI]] meets [[Benito Mussolini]] in [[Vatican City]].
我希望有一个正则表达式来匹配以*
开头的所有行,后跟任意数量的以**
开头的行。理想情况下,我希望在一个组中包含**
的每一行。
以下是我想要的结果:
> Match 1:
>> Group 1: "* [[February 2]]"
>> Group 2: "** A general [...] Part V)."
>> Group 3: "** The [[League of Nations]] [...] and Japan."
>> Group 4: "** The [[Reconstruction Finance Corporation]] begins operations in Washington, D.C."
> Match 2:
>> Group 1: "* [[February 4]]"
>> Group 2: "** The [[1932 Winter Olympics]] open in [[Lake Placid, New York]]."
>> Group 3: "** Japan occupies [[Harbin]], China."
(我已将[......]用于缩短目的。)
这是
我来到这个模式:/(*ANY)^\*{1} (.*)\n(?>(^\*{2}(.*?)\n)+)/gm
,这里是regex101的链接,我在那里测试我的正则表达式:https://regex101.com/r/ubtnMg/1。
以下是我的模式的说明:
* (*ANY)
匹配任何换行序列,因为我不确定他们在文本中使用哪个换行符。
* ^\*{1} (.*)\n
匹配以*
开头的任何行,捕获该行的文本,直到有换行符。
* (?>(^\*{2}(.*?)\n)+)
是棘手的部分。它应该匹配以^\*{1} (.*)\n
开头的**
之后的每一行,捕获文本直到组中的行尾,直到找到以*
开头的新行
它实际上给了我这个:
> Match 1: "* [[February 2]]
** A general [[World Disarmament Conference]] begins in [[Geneva]]. The principal issue at the conference is the demand made by Germany for ''gleichberechtigung'' ("equality of status" i.e. abolishing Part V of the Treaty of Versailles, which had disarmed Germany) and the French demand for ''sécurité'' ("security" i.e. maintaining Part V).
** The [[League of Nations]] again recommends negotiations between the [[Republic of China (1912–49)|Republic of China]] and Japan.
** The [[Reconstruction Finance Corporation]] begins operations in Washington, D.C."
>> Group 1: "[[February 2]]"
>> Group 2: "** The [[Reconstruction Finance Corporation]] begins operations in Washington, D.C."
>> Group 3: "The [[Reconstruction Finance Corporation]] begins operations in Washington, D.C."
> Match 2: "* [[February 4]]
** The [[1932 Winter Olympics]] open in [[Lake Placid, New York]].
** Japan occupies [[Harbin]], China."
>> Group 1: "[[February 4]]"
>> Group 2: "** Japan occupies [[Harbin]], China"
>> Group 3: " Japan occupies [[Harbin]], China."
我希望我已经足够清楚,你可以帮助我。不要犹豫,询问更多细节。
答案 0 :(得分:0)
感谢Rawing的评论,我发现了这个解决方案:
首先,我使用这种模式:/(*ANY)^\*{1} (.*)\n(^\*{2}(.*?)\n)+/gm
来匹配每个文本块,如下所示:
* [[February 2]]
** A general [[World Disarmament Conference]] begins in [[Geneva]]. The principal issue at the conference is the demand made by Germany for ''gleichberechtigung'' ("equality of status" i.e. abolishing Part V of the Treaty of Versailles, which had disarmed Germany) and the French demand for ''sécurité'' ("security" i.e. maintaining Part V).
** The [[League of Nations]] again recommends negotiations between the [[Republic of China (1912–49)|Republic of China]] and Japan.
** The [[Reconstruction Finance Corporation]] begins operations in Washington, D.C.
然后我使用此模式获取以*
开头的行:/^\*{1}(.*)/g
。
我还使用此模式获取以**
开头的每一行:/^\*{2}(.*)$/gm