正则表达式的回溯

时间:2011-09-28 13:58:29

标签: python regex

让我说我有一个正则表达式:

match = re.search(pattern, content)
if not match:
    raise Exception, 'regex traceback' # i want to throw here the regex matching process.

如果正则表达式fails to match那么我想抛出exception它的工作和它无法匹配正则表达式模式,在什么阶段等。甚至可以实现所需的功能?

3 个答案:

答案 0 :(得分:0)

我过去使用过Kodos(http://kodos.sourceforge.net/about.html)来执行RegEx调试。它不是理想的解决方案,因为你想要一些运行时的东西,但它可能对你有帮助。

答案 1 :(得分:0)

如果你需要测试re,你可以使用组跟随* ...,如(sometext)* 使用这个w /你想要的正则表达式,然后你应该能够找出你的失败位置

然后利用以下内容,如python.org上所述

  

POS       传递给RegexObject的search()或match()方法的pos值。这是RE引擎开始寻找匹配项的字符串索引。

     

endpos       传递给>的search()或match()方法的endpos的值。 RegexObject。这是RE引擎不会超出的字符串索引。

     

lastIndex的       最后匹配的捕获组的整数索引,如果没有匹配组,则为None。例如,表达式(a)b,((a)(b))和((ab))如果应用于字符串'ab',则lastindex == 1,而表达式(a)(b)将如果应用于相同的字符串,则具有lastindex == 2。

     

lastgroup       最后匹配的捕获组的名称,如果该组没有名称,或者根本没有匹配组,则为“无”。

     

重新       正则表达式对象,其match()或search()方法生成此MatchObject实例。

     

的字符串       传递给match()或search()的字符串。

所以这是一个非常简单的例子

>>> m1 = re.compile(r'the real thing')
>>> m2 = re.compile(r'(the)* (real)* (thing)*')
>>> if not m1.search(mytextvar):
>>>     res = m2.search(mytextvar)
>>>     print res.lastgroup
>>>     #raise my exception

答案 2 :(得分:0)

我有一些东西可以帮助我在我的代码中调试复杂的正则表达式模式 这对你有帮助吗? :

import re

li = ('ksjdhfqsd\n'
      '5 12478 abdefgcd ocean__12      ty--\t\t ghtr789\n'
      'qfgqrgqrg',

      '6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12340\n',

      '2 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877',

      '9 54879 bbdecddf antarctic__13  18:13pomodoro\t\t ghtr6798',


      'ksjdhfqsd\n'
      '5 12478 abdefgcd ocean__1247101247887 ty--\t\t ghtr789\n'
      'qfgqrgqrg',

      '6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12940\n',

      '25 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877',

      '9 54879 bbdeYddf antarctic__13  18:13pomodoro\t\t ghtr6798')


tupleRE = ('^\d',
           ' ',
           '\d{5}',
           ' ',
           '[abcdefghi]+',
           ' ',
           '(?=[a-z\d_ ]{14} [^ ]+\t\t ght)',
           '[a-z]+',
           '__',
           '[\d]+',
           ' +',
           '[^\t]+',
           '\t\t',
           ' ',
           'ght',
           '(r[5-9]+|u[0-4]+)',
           '$')  



def REtest(ch, tuplRE, flags = re.MULTILINE):
    for n in xrange(len(tupleRE)):
        regx = re.compile(''.join(tupleRE[:n+1]), flags)
        testmatch = regx.search(ch)
        if not testmatch:
            print '\n  -*- tupleRE :\n'
            print '\n'.join(str(i).zfill(2)+' '+repr(u)
                            for i,u in enumerate(tupleRE[:n]))
            print '   --------------------------------'
            # tupleRE doesn't works because of element n
            print str(n).zfill(2)+' '+repr(tupleRE[n])\
                  +"   doesn't match anymore from this ligne "\
                  +str(n)+' of tupleRE'
            print '\n'.join(str(n+1+j).zfill(2)+' '+repr(u)
                            for j,u in enumerate(tupleRE[n+1:
                                                         min(n+2,len(tupleRE))]))

            for i in xrange(n):
                match = re.search(''.join(tupleRE[:n-i]),ch, flags)
                if match:
                    break

            matching_portion = match.group()
            matching_li = '\n'.join(map(repr,
                                        matching_portion.splitlines(True)[-5:]))
            fin_matching_portion = match.end()
            print ('\n\n  -*- Part of the tested string which is concerned :\n\n'
                   '######### matching_portion ########\n'+matching_li + '\n'
                   '##### end of matching_portion #####\n'
                   '-----------------------------------\n'
                   '######## unmatching_portion #######')
            print '\n'.join(map(repr,
                                ch[fin_matching_portion:
                                   fin_matching_portion+300].splitlines(True)) )
            break
    else:
        print '\n  SUCCES . The regex integrally matches.'



for x in li:
    print '  -*- Analyzed string :\n%r' % x
    REtest(x,tupleRE)
    print '\nmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm'

结果

  -*- Analyzed string :
'ksjdhfqsd\n5 12478 abdefgcd ocean__12      ty--\t\t ghtr789\nqfgqrgqrg'

  SUCCESS . The regex integrally matches.

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12340\n'

  SUCCESS . The regex integrally matches.

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'2 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877'

  SUCCESS . The regex integrally matches.

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'9 54879 bbdecddf antarctic__13  18:13pomodoro\t\t ghtr6798'

  SUCCESS . The regex integrally matches.

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'ksjdhfqsd\n5 12478 abdefgcd ocean__1247101247887 ty--\t\t ghtr789\nqfgqrgqrg'

  -*- tupleRE :

00 '^\\d'
01 ' '
02 '\\d{5}'
03 ' '
04 '[abcdefghi]+'
05 ' '
   --------------------------------
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)'   doesn't match anymore from this ligne 6 of tupleRE
07 '[a-z]+'


  -*- Part of the tested string which is concerned :

######### matching_portion ########
'5 12478 abdefgcd '
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'ocean__1247101247887 ty--\t\t ghtr789\n'
'qfgqrgqrg'

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12940\n'

  -*- tupleRE :

00 '^\\d'
01 ' '
02 '\\d{5}'
03 ' '
04 '[abcdefghi]+'
05 ' '
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)'
07 '[a-z]+'
08 '__'
09 '[\\d]+'
10 ' +'
11 '[^\t]+'
12 '\t\t'
13 ' '
14 'ght'
15 '(r[5-9]+|u[0-4]+)'
   --------------------------------
16 '$'   doesn't match anymore from this ligne 16 of tupleRE



  -*- Part of the tested string which is concerned :

######### matching_portion ########
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12'
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'940\n'

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'25 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877'

  -*- tupleRE :

00 '^\\d'
   --------------------------------
01 ' '   doesn't match anymore from this ligne 1 of tupleRE
02 '\\d{5}'


  -*- Part of the tested string which is concerned :

######### matching_portion ########
'2'
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'5 47890 bbcedefg arctic__124    **juyf\t\t ghtr89877'

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
  -*- Analyzed string :
'9 54879 bbdeYddf antarctic__13  18:13pomodoro\t\t ghtr6798'

  -*- tupleRE :

00 '^\\d'
01 ' '
02 '\\d{5}'
03 ' '
04 '[abcdefghi]+'
   --------------------------------
05 ' '   doesn't match anymore from this ligne 5 of tupleRE
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)'


  -*- Part of the tested string which is concerned :

######### matching_portion ########
'9 54879 bbde'
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'Yddf antarctic__13  18:13pomodoro\t\t ghtr6798'

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm