正则表达式匹配两个字符串之间的重复模式

时间:2016-11-10 20:35:35

标签: regex dataset

我有一个重复模式的数据集:

----
MV: The Oxford and Cambridge University Boat Race (1895)
SD: 30 March 1895 - 
----
MV: Awakening of Rip (1896)
CP: American Mutoscope Company; 4 February 1897; 9237 (in copyright registry) 
PD: August 1896 - August 1896 
----
MV: A Chegada do Comboio Inaugural à Estação Central do Porto (1897)
PD: 7 November 1896 - 
----
MV: Exit of Rip and the Dwarf (1896)
CP: American Mutoscope and Biograph Co.; 9 December 1902; H24875 (in copyright registry) 
PD: August 1896 - August 1896 
----

现在,我想在第一个----和下一个----字符串之间取一些内容,并将\n更改为\t,以便每个条目将在同一行标签中分隔。然后每个条目将由----分隔,以便更容易阅读。最后它应该如下所示:

----
MV: The Oxford and Cambridge University Boat Race (1895)    SD: 30 March 1895 - 
----
MV: Awakening of Rip (1896) CP: American Mutoscope Company; 4 February 1897; 9237 (in copyright registry)   PD: August 1896 - August 1896 
----
MV: A Chegada do Comboio Inaugural à Estação Central do Porto (1897)    PD: 7 November 1896 - 
----
MV: Exit of Rip and the Dwarf (1896)    CP: American Mutoscope and Biograph Co.; 9 December 1902; H24875 (in copyright registry)    PD: August 1896 - August 1896 
----

我已经尝试了一些积极的外观模式,但没有运气。

1 个答案:

答案 0 :(得分:1)

你想要同时拥有negative lookaheads and lookbehinds的东西。像这样:

(?<!----)\n(?!----)

然后只需用\t替换匹配即可。

Demo on Regex101 (modification of yours)