你如何使用re.split作为更多的“切片”函数进行拆分?

时间:2014-02-07 23:10:01

标签: python regex split

如何使用re.split作为更多的“切片”函数进行拆分?

我知道很多正则表达式函数,所以这不是问题。问题在于,当使用split函数时,它会删除它搜索的内容,除非它在一个组中,但这会导致它自己的问题。我需要它在NAME,TAKE SEL或TAKE分开,但保留一切。

以下是文字:

NAME "440 Sine Wave 5 seconds.wav"
VOLPAN 1.000000 0.000000 1.000000 -1.000000
SOFFS 0.00000000000000
PLAYRATE 1.00000000000000 1 0.00000000000000 -1 0 0.002500
CHANMODE 0
GUID {857A4ED4-172A-43EE-AECF-CC4D027CE5D3}
<SOURCE WAVE
FILE "C:\Users\Greg\Desktop\test2\440 Sine Wave 5 seconds.wav"
>
SM 0.607738664073 0.6077386641 + 2.044211870063 2.0442118701 + 3.314938167670 3.3149381677 + 4.088423740126 4.0884237401
TAKE SEL
NAME "440 Sine Wave 5 seconds render 002.wav"
TAKEVOLPAN 0.000000 1.000000 -1.000000
SOFFS 0.00000000000000
PLAYRATE 1.00000000000000 1 0.00000000000000 -1 0 0.002500
CHANMODE 0
GUID {DD233FDE-7641-4F02-AE9A-8B99FF400F24}
<SOURCE WAVE
FILE "C:\Users\Greg\Documents\REAPER Media\440 Sine Wave 5 seconds render 002.wav"
>
SM 0.899258786122 0.8992587861 + 1.268694185507 1.2686941855 + 1.709174854005 1.7091748540 + 2.050192145745 2.0501921457 + 2.718017675403 2.7180176754 + 3.307693409037 3.3076934090 + 3.762383131357 3.7623831314 + 4.131818530742 4.1318185307 + 4.458626768660 4.4586267687
TAKE
NAME "440 Sine Wave 5 seconds render 003.wav"
TAKEVOLPAN 0.000000 1.000000 -1.000000
SOFFS 0.00000000000000
PLAYRATE 1.00000000000000 1 0.00000000000000 -1 0 0.002500
CHANMODE 0
GUID {A01A4793-7E2C-47EC-A22C-659A8FE0C162}
<SOURCE WAVE
FILE "C:\Users\Greg\Documents\REAPER Media\440 Sine Wave 5 seconds render 003.wav"
>
SM 0.679018451873 0.6790184519 + 2.874317267450 2.8743172675
>

以下是拆分方式

NAME "440 Sine Wave 5 seconds.wav"
VOLPAN 1.000000 0.000000 1.000000 -1.000000
SOFFS 0.00000000000000
PLAYRATE 1.00000000000000 1 0.00000000000000 -1 0 0.002500
CHANMODE 0
GUID {857A4ED4-172A-43EE-AECF-CC4D027CE5D3}
<SOURCE WAVE
FILE "C:\Users\Greg\Desktop\test2\440 Sine Wave 5 seconds.wav"
>
SM 0.607738664073 0.6077386641 + 2.044211870063 2.0442118701 + 3.314938167670 3.3149381677 + 4.088423740126 4.0884237401

TAKE SEL
NAME "440 Sine Wave 5 seconds render 002.wav"
TAKEVOLPAN 0.000000 1.000000 -1.000000
SOFFS 0.00000000000000
PLAYRATE 1.00000000000000 1 0.00000000000000 -1 0 0.002500
CHANMODE 0
GUID {DD233FDE-7641-4F02-AE9A-8B99FF400F24}
<SOURCE WAVE
FILE "C:\Users\Greg\Documents\REAPER Media\440 Sine Wave 5 seconds render 002.wav"
>
SM 0.899258786122 0.8992587861 + 1.268694185507 1.2686941855 + 1.709174854005 1.7091748540 + 2.050192145745 2.0501921457 + 2.718017675403 2.7180176754 + 3.307693409037 3.3076934090 + 3.762383131357 3.7623831314 + 4.131818530742 4.1318185307 + 4.458626768660 4.4586267687

TAKE
NAME "440 Sine Wave 5 seconds render 003.wav"
TAKEVOLPAN 0.000000 1.000000 -1.000000
SOFFS 0.00000000000000
PLAYRATE 1.00000000000000 1 0.00000000000000 -1 0 0.002500
CHANMODE 0
GUID {A01A4793-7E2C-47EC-A22C-659A8FE0C162}
<SOURCE WAVE
FILE "C:\Users\Greg\Documents\REAPER Media\440 Sine Wave 5 seconds render 003.wav"
>
SM 0.679018451873 0.6790184519 + 2.874317267450 2.8743172675
>

1 个答案:

答案 0 :(得分:3)

您可以使用前瞻在每个令牌之前执行拆分,但是您不能拆分零长度匹配,因此您必须匹配某些内容。在这种情况下,看起来您应该只能在令牌之前拆分换行符。

对于执行此操作的正则表达式,根据您的示例,如果NAMETAKE之前没有TAKE SEL,则您只希望在re.split(r'\n(?=TAKE(?: SEL)?\n|(?<!\nTAKE\n)(?<!\nTAKE SEL\n)NAME)', s) 上拆分它有点复杂。以下应该有效:

TAKE

因此,我们的想法是,如果下一行是TAKE SELNAME,或者如果下一行以TAKE开头且前一行不是{{}},我们将匹配换行符{1}}或TAKE SEL

示例:

>>> s = 'foo\nTAKE\nbar'   # split on TAKE
>>> re.split(r'\n(?=TAKE(?: SEL)?\n|(?<!\nTAKE\n)(?<!\nTAKE SEL\n)NAME)', s)
['foo', 'TAKE\nbar']
>>> s = 'foo\nTAKE SEL\nbar'   # split on TAKE SEL
>>> re.split(r'\n(?=TAKE(?: SEL)?\n|(?<!\nTAKE\n)(?<!\nTAKE SEL\n)NAME)', s)
['foo', 'TAKE SEL\nbar']
>>> s = 'foo\nTAKE SEL\nNAME\nbar'   # split on TAKE SEL but not on NAME
>>> re.split(r'\n(?=TAKE(?: SEL)?\n|(?<!\nTAKE\n)(?<!\nTAKE SEL\n)NAME)', s)
['foo', 'TAKE SEL\nNAME\nbar']
>>> s = 'foo\nNAME\nbar'   # split on NAME since no TAKE or TAKE SEL before
>>> re.split(r'\n(?=TAKE(?: SEL)?\n|(?<!\nTAKE\n)(?<!\nTAKE SEL\n)NAME)', s)
['foo', 'NAME\nbar']