I'm trying to figure out a clean way to split this on a variable separator String
. I'm close but it's not grabbing the stuff after the newline. No amount of r^@(pv|rv|dv|ex)
seems to do the trick.
re.DOTALL|re.MULTILINE
I'm close but it's not correctly capturing the multi-line components.
答案 0 :(得分:1)
也许更好的方法是:
re.S
以便.
匹配换行符re.M
作为锚点.*?
看起来像这样:
^@(pv|rv|dv|ex)@\s+(.*?)(?=^@(?:pv|rv|dv|ex)@|\Z)
Python演示:
txt="""@ex@ 70526 1422983747
@rv@ 1 @db.counters@ @change@ @1010@
@pv@ 0 @db.desc@ 1010 @PM recovery log.9
@
@pv@ 2 @db.change@ 1010 1010 @192_168_1_20~ac~70520@ @bozo@ 1422983747 0 @PM recovery log.9
@ @@
@pv@ 2 @db.changex@ 1010 1010 @192_168_1_20~ac~70520@ @bozo@ 1422983747 0 @PM recovery log.9
@ @@
@rv@ 10 @db.working@ @//192_168_1_20~ac~70520/pgtext/log.9@ @//depot/fyz/pgtext/log.9@ @192_168_1_20~ac~70520@ @bozo@ 0 1 0 160 0 1010 0 0 00000000000000000000000000000000 -1 0 0 160 @@ 0
@ex@ 70526 1422983747"""
import re
SEPS = ['pv', 'rv', 'dv', 'ex']
sep_st='|'.join(SEPS)
pat=re.compile(r'^@({})@\s+(.*?)(?=^@(?:{})@|\Z)'.format(sep_st, sep_st), re.S | re.M)
>>> '\n'.join(repr(e) for e in pat.findall(txt))
('ex', '70526 1422983747\n')
('rv', '1 @db.counters@ @change@ @1010@\n')
('pv', '0 @db.desc@ 1010 @PM recovery log.9\n@\n')
('pv', '2 @db.change@ 1010 1010 @192_168_1_20~ac~70520@ @bozo@ 1422983747 0 @PM recovery log.9\n@ @@\n')
('pv', '2 @db.changex@ 1010 1010 @192_168_1_20~ac~70520@ @bozo@ 1422983747 0 @PM recovery log.9\n@ @@\n')
('rv', '10 @db.working@ @//192_168_1_20~ac~70520/pgtext/log.9@ @//depot/fyz/pgtext/log.9@ @192_168_1_20~ac~70520@ @bozo@ 0 1 0 160 0 1010 0 0 00000000000000000000000000000000 -1 0 0 160 @@ 0\n')
('ex', '70526 1422983747')
答案 1 :(得分:1)
最简单的方法是使用断言进行匹配,直到下一个变量分隔符。
# (?m)^(@(pv|rv|dv|ex)@(?:(?!@(?:pv|rv|dv|ex)@)[\S\s])*)
(?m) # Multi-line mode
^ # BOL
( # (1 start), Whole record
@ # Delimiter start
( pv | rv | dv | ex ) # (2), Delimiter contents
@ # Delimiter end
(?: # Match anything up until the next delimiter
(?! # ASSERT - not object delimiter
@
(?: pv | rv | dv | ex )
@
)
[\S\s] # Grab this (any) character
)* # Do zero to many times (until next delimiter)
) # (1 end), Record
如果您需要验证正确的表单,也可以这样做。
# (?m)^(@(pv|rv|dv|ex)@(?:[^@]*(?:@(?!(?:pv|rv|dv|ex)@)[^@]*@)?)*)
(?m) # Multi-line mode
^ # BOL
( # (1 start), Whole record
@ # Delimiter start
( pv | rv | dv | ex ) # (2), Object delimiter contents
@ # Delimiter end
(?: # Match anything up until the next delimiter
[^@]* # Zero or more Not @
(?: # Optional non object delimiter
@ # Delimiter start
(?! # ASSERT - not object delimiter content
(?: pv | rv | dv | ex )
@
)
[^@]* # Zero or more Not @
@ # Delimiter end
)?
)* # Do zero to many times (until next delimiter)
) # (1 end), Record
答案 2 :(得分:0)
>>> seps_re =re.compile("\@(%s)\@"%"|".join(SEPS))
>>> seps_re.split(s)
尽可能接近......而且它很接近......我认为你可以接受它并让它与这样的东西一起工作
>>> seps_re =re.compile("\@(%s)\@"%"|".join(SEPS))
>>> my_items = filter(None,seps_re.split(s))
>>> my_pairs = zip(*[iter(my_items)]*2)
>>> print my_pairs
[('ex', ' 70526 1422983747\n'), ('rv', ' 1 @db.counters@ @change@ @1010@\n'), ('
pv', ' 0 @db.desc@ 1010 @PM recovery log.9\n@\n'), ('pv', ' 2 @db.change@ 1010 1
010 @192_168_1_20~ac~70520@ @bozo@ 1422983747 0 @PM recovery log.9\n@ @@\n'), ('
pv', ' 2 @db.changex@ 1010 1010 @192_168_1_20~ac~70520@ @bozo@ 1422983747 0 @PM
recovery log.9\n@ @@\n'), ('rv', ' 10 @db.working@ @//192_168_1_20~ac~70520/pgte
xt/log.9@ @//depot/fyz/pgtext/log.9@ @192_168_1_20~ac~70520@ @bozo@ 0 1 0 160 0
1010 0 0 00000000000000000000000000000000 -1 0 0 160 @@ 0\n'), ('ex', ' 70526 14
22983747')]