Python正则表达式错误:坏字符

时间:2017-04-27 02:54:39

标签: python regex

作为python的新手,我无法从这些答案中找到答案(abc)。我试图做的是解析一些这样的日志:

#this is what I want parse
#[time-5.40052;node-1;line-638]NOTE:BundleTrace:good! one bundle recept, it's one hop! bp_header=,destination ip=10.0.0.3,source ip=10.0.0.1,source seqno=139,payload size=345,offset size=345,src time stamp=5,hop time stamp=5,bundle type=BundlePacket
    r2 = re.compile(r'[time-(\d+\.*\d*);node-(\d+\.*\d*);line-(\d+\.*\d*)]NOTE:BundleTrace:good! one bundle recep'
    r't, it\'s one hop! bp_header=,destination ip=10.0.0.(\d+\.*\d*),source ip=10.0.0.(\d+\.*\d*),source seqn'
    r'o=(\d+\.*\d*),payload size=(\d+\.*\d*),offset size=(\d+\.*\d*),src time st'
    r'amp=(\d+\.*\d*),hop time stamp=(\d+\.*\d*),bundle type=([a-zA-Z]+)', re.VERBOSE)
    hop = r2.match(line)

但我得到了一些错误:

error                                     Traceback (most recent call last)
<ipython-input-9-f8308b7bbe0f> in <module>()
     51     r't, it\'s one hop! bp_header=,destination ip=10.0.0.(\d+\.*\d*),source ip=10.0.0.(\d+\.*\d*),source seqn'
     52     r'o=(\d+\.*\d*),payload size=(\d+\.*\d*),offset size=(\d+\.*\d*),src time st'
---> 53     r'amp=(\d+\.*\d*),hop time stamp=(\d+\.*\d*),bundle type=([a-zA-Z]+)', re.VERBOSE)
     54     hop = r2.match(line)
     55 

/home/dtn-012345/miniconda3/lib/python3.6/re.py in compile(pattern, flags)
    231 def compile(pattern, flags=0):
    232     "Compile a regular expression pattern, returning a pattern object."
--> 233     return _compile(pattern, flags)
    234 
    235 def purge():

/home/dtn-012345/miniconda3/lib/python3.6/re.py in _compile(pattern, flags)
    299     if not sre_compile.isstring(pattern):
    300         raise TypeError("first argument must be string or compiled pattern")
--> 301     p = sre_compile.compile(pattern, flags)
    302     if not (flags & DEBUG):
    303         if len(_cache) >= _MAXCACHE:

/home/dtn-012345/miniconda3/lib/python3.6/sre_compile.py in compile(p, flags)
    560     if isstring(p):
    561         pattern = p
--> 562         p = sre_parse.parse(p, flags)
    563     else:
    564         pattern = None

/home/dtn-012345/miniconda3/lib/python3.6/sre_parse.py in parse(str, flags, pattern)
    854 
    855     try:
--> 856         p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, False)
    857     except Verbose:
    858         # the VERBOSE flag was switched on inside the pattern.  to be

/home/dtn-012345/miniconda3/lib/python3.6/sre_parse.py in _parse_sub(source, state, verbose, nested)
    413     start = source.tell()
    414     while True:
--> 415         itemsappend(_parse(source, state, verbose))
    416         if not sourcematch("|"):
    417             break

/home/dtn-012345/miniconda3/lib/python3.6/sre_parse.py in _parse(source, state, verbose)
    550                     if hi < lo:
    551                         msg = "bad character range %s-%s" % (this, that)
--> 552                         raise source.error(msg, len(this) + 1 + len(that))
    553                     setappend((RANGE, (lo, hi)))
    554                 else:

error: bad character range e-( at position 4

我相信在'bundle type =([a-zA-Z] +)'周围肯定会有一些supid bug,但我找不到它。谁能告诉我为什么? :)

1 个答案:

答案 0 :(得分:1)

您需要转义[]并匹配空格。休息一切都很好。试试这个:

import re

line = "[time-5.40052;node-1;line-638]NOTE:BundleTrace:good! one bundle recept, it's one hop! bp_header=,destination ip=10.0.0.3,source ip=10.0.0.1,source seqno=139,payload size=345,offset size=345,src time stamp=5,hop time stamp=5,bundle type=BundlePacket"
r2 = re.compile(r'\[time-(\d+\.*\d*);node-(\d+\.*\d*);line-(\d+\.*\d*)\]NOTE:BundleTrace:good!\sone\sbundle\srecept,\sit\'s\sone\shop!\sbp_header=,destination\sip=10.0.0.(\d+\.*\d*),source\sip=10.0.0.(\d+\.*\d*),source\sseqno=(\d+\.*\d*),payload\ssize=(\d+\.*\d*),offset\ssize=(\d+\.*\d*),src\stime\sstamp=(\d+\.*\d*),hop\stime\sstamp=(\d+\.*\d*),bundle\stype=([a-zA-Z]+)', re.VERBOSE)

hop = r2.match(line)